The LLVM Project Blog

GSoC 2024: Out-Of-Process Execution For Clang-Repl

Mon, 04 Nov 2024 00:00:00 +0000

Hello! I’m Sahil Patidar, and this summer I had the exciting opportunity toparticipate in Google Summer of Code (GSoC) 2024. My project revolved aroundenhancing Clang-Repl by introducing Out-Of-Process Execution.

Mentors: Vassil Vassilev and Matheus Izvekov

Project Background

Clang-Repl, part of the LLVM project, is a powerful interactive C++ interpreter using Just-In-Time (JIT) compilation. However, it faced two major issues: high resource consumption and instability. Running both Clang-Repl and JIT in the same process consumed excessive system resources, and any crash in user code would shut down the entire session.

To address these problems, Out-Of-Process Execution was introduced. By executing user code in a separate process, resource usage is reduced and crashes no longer affect the main session. This solution significantly enhances both the efficiency and stability of Clang-Repl, making it more reliable and suitable for a broader range of use cases, especially on resource-constrained systems.

What We Accomplished

As part of my GSoC project, I’ve been focused on implementing out-of-process execution in Clang-Repl and enhancing the ORC JIT infrastructure to support this feature. Here is a breakdown of the key tasks and improvements I worked on:

Out-Of-Process Execution Support for Clang-Repl

PR: #110418

One of the primary objectives of my project was to implement out-of-process (OOP) execution capabilities within Clang-Repl, enabling it to execute code in a separate, isolated process. This feature leverages ORC JIT’s remote execution capabilities to enhance code execution flexibility by isolating runtime environments.

To enable OOP execution in Clang-Repl, I utilized the llvm-jitlink-executor, allowing Clang-Repl to offload code execution to a dedicated executor process. This setup introduces a layer of isolation between Clang-Replâ€™s main process and the code execution environment.

New Command-Line Flags:
To facilitate the out-of-process execution, I added two key command-line flags:
- --oop-executorThis flag starts a separate JIT executor process. The executor handles the actual code execution independently of the main Clang-Repl process.
- --oop-executor-connectThis flag establishes a communication link between Clang-Repl and the out-of-process executor. It allows Clang-Repl to transmit code to the executor and retrieve the results from the execution.

With these flags in place, Clang-Repl can utilize llvm-jitlink-executor to execute code in an isolated environment. This approach significantly enhances separation between the compilation and execution stages, increasing flexibility and ensuring a more secure and manageable execution process.

Issues Encountered

Block Dependence Calculation in ObjectLinkingLayerCommit Link
Code Example
```
clang-repl> int f() {return 1;}clang-repl> int f1() {return f();}clang-repl> f1();error: disconnectingclang-repl> JIT session error: FD-transport disconnectedJIT session error: disconnectingJIT session error: FD-transport disconnectedJIT session error: Failed to materialize symbols: { (main, { __Z2fv }) }disconnecting
```
During my work on clang-repl, I encountered an issue where the JIT session would crash during incremental compilation. The root cause was a bug in ObjectLinkingLayer::computeBlockNonLocalDeps. The problem arose from the way the worklist was built: it was being populated within the same loop that records immediate dependencies and dependants, which caused some blocks to be missed from the worklist. This bug was fixed by Lang Hames.

ORC JIT Enhancements

As part of the OOP execution work, several improvements were made to ORC JIT, the underlying framework responsible for dynamic compilation and execution of code in Clang-Repl. These improvements target better handling of incremental execution, especially for Mach-O and ELF platforms, and ensuring that initializers are properly managed across different execution environments.

Incremental Initializer Execution for Mach-O and ELFPRs: #97441, #110406
In a typical JIT execution environment, the dlopen function is used to handle code mapping, reference counting, and initializer execution for dynamically loaded libraries. However, this approach is often too broad for interactive environments like Clang-Repl, where we only need to execute newly introduced initializers rather than reinitializing everything. To address this, I introduced the dlupdate function in the ORC runtime.
The dlupdate function is a targeted solution that focuses solely on running new initializers added during a REPL session. Unlike dlopen, which handles a variety of tasks and can lead to unnecessary overhead, dlupdate only triggers the execution of newly registered initializers, avoiding redundant operations. This improvement is particularly beneficial in interactive settings like Clang-Repl, where code is frequently updated in small increments.
By streamlining the execution of initializers, this change significantly improves the efficiency of Clang-Repl.
Push-Request Model for ELF InitializersPR: #102846
A push-request model has been introduced to manage ELF initializers within the runtime state for each JITDylib, similar to how initializers are handled for Mach-O and COFF. Previously, ELF required a fresh request for initializers with each invocation of dlopen, but lacked mechanisms to register, deregister, or retain these initializers. This created issues during subsequent dlopen calls, as initializers were erased after the rt_getInitializers function was invoked, making further executions impossible.
To resolve these issues, the following functions were introduced:
- __orc_rt_elfnix_register_init_sections: Registers ELF initializers for the JITDylib.
- __orc_rt_elfnix_register_jitdylib: Registers the JITDylib with the ELF runtime state.
With the new push-request model, the management and tracking of initializers for each JITDylib state are now more efficient. By leveraging Mach-Oâ€™s RecordSectionsTracker, only newly registered initializers are executed, greatly improving efficiency and reliability when working with ELF targets in clang-repl.
This update is crucial for enabling out-of-process execution in clang-repl on ELF platforms, offering a more effective approach to managing incremental execution.

Additional Improvements

Beyond the main enhancements to Clang-Repl and ORC JIT, I also worked on several other improvements:

Auto-loading Dynamic Libraries in ORC JIT.
PR: #109913 (On-going)
With this update, weâ€™ve introduced a new feature to the ORC executor and controller: automatic loading of dynamic libraries in the ORC JIT. This enhancement enables efficient resolution of symbols from both loaded and unloaded libraries.
- How It Works:
  - Symbol Lookup:When a lookup request is made, the system first attempts to resolve the symbol from already loaded libraries.
  - Unloaded Libraries Scan:If the symbol is not found in any loaded library, the system then scans the unloaded dynamic libraries to locate it.
- Key Addition: Global Bloom FilterA significant improvement in this update is the introduction of a Global Bloom Filter. When a symbol cannot be resolved in the loaded libraries, the symbol tables from the scanned libraries are incorporated into this filter. If the symbol is still not found, the bloom filterâ€™s result is returned to the controller, allowing it to skip checking for symbols that do not exist in the global table during future lookups.
Additionally, the system tracks symbols that were previously thought to be present but are actually absent in both loaded and unloaded libraries. With these enhancements, symbol resolution is significantly faster, as the bloom filter helps prevent unnecessary lookups, thereby improving efficiency for both loaded and unloaded dynamic libraries.
Refactor of dlupdate FunctionPR: #110491
This update simplifies the dlupdate function by removing the mode argument, streamlining the function’s interface. The change enhances the clarity and usability of dlupdate by reducing unnecessary parameters, improving the overall maintainability of the code.

Benchmarks: In-Process vs Out-of-Process Execution

Result

With these changes, clang-repl now supports out-of-process execution. We can run it using the following command:

clang-repl --oop-executor=path/to/llvm-jitlink-executor --orc-runtime=path/to/liborc_rt.a

Future Work

Crash Recovery and Session Continuation :Investigate and develop ways to enhance crash recovery so that if something goes wrong, the session can seamlessly resume without losing progress. This involves exploring options for an automatic process to restart the executor in the event of a crash.
Finalize Auto Library Loading in ORC JIT :Wrap up the feature that automatically loads libraries in ORC JIT. This will streamline symbol resolution for both loaded and unloaded dynamic libraries by ensuring that any required dylibs containing symbol definitions are loaded as needed.

Conclusion

With this project, Clang-Repl now supports out-of-process execution for both ELF and Mach-O, making it much more efficient and stable, especially on devices with limited resources.

In the future, I plan to work on automating library loading and improving ORC-JIT to make Clang-Repl’s out-of-process execution even better.

Acknowledgements

I would like to thank Google Summer of Code (GSoC) and the LLVM community for providing me with this amazing opportunity. Special thanks to my mentors, Vassil Vassilev and Matheus Izvekov, for their continuous support and guidance. I am also deeply grateful to Lang Hames for sharing their expertise on ORC-JIT and helping improve clang-repl. This experience has been a major step in my development, and I look forward to continuing my contributions to open source.

GSoC 2024: The 1001 thresholds in LLVM

Mon, 21 Oct 2024 00:00:00 +0000

Hey everyone! My name is Shourya and I worked on LLVM this summer through GSoC. My project is called The 1001 thresholds in LLVM. The main objective of this project was to study how varying different thresholds in llvm affect performance parameters like compile-time, bitcode-size, execution-time and llvm stats.

Background

LLVM has lots of thresholds and flags to avoid “costly cases”. However, it is unclear if these thresholds are useful, their value is reasonable, and what impact they really have. Since there are a lot, one cannot do a simple exhaustive search. An example of work in this direction includes the introduction of a C++ class that can replace hardcoded values which offers control over the threshold, e.g., one can increase the recursion limit via a command line flag from the hardcoded “6” to a different number. As such there is a need to explore different thresholds in llvm, understand what it means for a threshold to be hit, profile different thresholds and select optimal values for different thresholds.

What We Did

This work provides a tool that can efficiently explore these knobs and understand how modifying them affects metrics like compile time, size of the generated program, or any statistic that LLVM emits like â€œNumber of loops vectorizedâ€. (Note that execution-time is currently not evaluated because input-gen does not work on optimized IR and is thus part of future work.)

We first built a clang matcher for which we looked for the following patterns :

Const knob_name = knob_val
Cl::init
Enum {knob_name = knob_val}

to first identify the knobs in the codebase and then used a custom python tool (optimised to deal with I/O and cache bottlenecks) to collect the different stat values in parallel and stored them in a json file. After manual selection of interesting knobs, we have so far conducted three studies in which we measure compile-time and bitcode-size along with various other statistics, and present them in the form of interactive graphs. Two of them (on 10,000 and 100 bitcode files) look at average statistics for each knob value while the third one (on 10,000 bitcode files) studies how each file is affected individually by changing knob values. We see some very interesting patterns in these graphs, for instance in the following two graphs, representing the jump-threading-threshold, we can observe improved statistics (top graph) and decreased average compile time (bottom graph) if the knob value is increased.

Results

The per file study proves that there is no one single magic knob value and the optimum, with regards to compile time or code size, depends on the file that is currently being compiled. For instance here we can see that different knob values (for the knob licm-mssa-optimization-cap) give good cumulative compile time improvements for different files. In detail, most files benefit from a knob value of 300 while 60 is the best knob value for the second most files.

We further show that the presence of an oracle that can tell the best knob value for each file can significantly improve the cumulative compile time.

In this project, we explored various thresholds in LLVMâ€”specifically, 93 thresholds (a 100 file study for each can be found here) using the Clang matcherâ€”and observed that these thresholds are largely file-specific. This indicates that there is no universally optimal value, or even a set of values, that can be applied across different scenarios. Instead, what is needed is an adaptive mechanism within LLVM, an oracle, that can dynamically determine the appropriate threshold values during compilation.

We also experimented with varying thresholds cumulatively by leveraging file-specific information through an LLVM pass. However, after discussions with the mentors, this approach was set aside due to the significant changes it would necessitate across other parts of the LLVM codebase.

As a result, we have not yet categorized different thresholds, such as identifying optimal threshold values for specific file types (e.g., I/O-intensive files). Nonetheless, we provide a tool that can efficiently collect this data (LLVM statistics, bitcode-size and compile-time) and help visualize it with the help of interactive graphs as well as histograms that examine these variations on a per-file basis. Additionally, a correlation table between knob values and performance metrics further illustrates the significant impact this study could have on improving LLVM’s overall performance.

Future Work

The early results show that we need a better understanding of knob values to maximise various objectives. Our results will provide the community with the first step in developing a guided compilation model attune to the file that is being compiled. We further intend to show how these knobs interact with each other and whether modifying multiple knobs together compounds the benefits or not. One more area of work could be on input-gen that would enable us to collect and study execution-time in our performance parameters.

Acknowledgements

This project would not have been possible without my amazing mentors, Jan HÃ¼ckelheim, Johannes Doerfert, the LLVM Foundation admins, and the GSoC admins.

Links

Code

Studies

GSoC Project Page

GSoC 2024: 3-way comparison intrinsics

Mon, 07 Oct 2024 00:00:00 +0000

Hello everyone! My name is Volodymyr, and in this post I would like to talk about the project I have been working on for the past couple of months as part of Google Summer of Code 2024. The aim of the project was to introduce 3-way comparison intrinsics to LLVM IR and add a decent level of optimizations for them.

Background

Three-way comparison is an operation present in many high-level languages, such as C++ and its spaceship operator or Rust and the Ord trait. It operates on two values for which there is a defined comparison operation and returns -1 if the first operand is less than the second, 0 if they are equal, and 1 otherwise. At the moment, compilers that use LLVM express this operation using different sequences of instructions which are optimized and lowered individually rather than as a single operation. Adding an intrinsic for this operation would therefore help us generate better machine code on some targets, as well as potentially optimize patterns in the middle-end that we didn’t optimize before.

What was done

Over the course of the project I have added two new intrinsics to the LLVM IR: llvm.ucmp for an unsigned 3-way comparison and llvm.scmp for a signed comparison. They both take two arguments that must be integers or vectors of integers and return an integer or a vector of integers with the same number of elements. The arguments and the result do not need to have the same type.

In the middle-end the following passes received some support for these intrinsics:

InstSimplify (#1, #2)
InstCombine (#1, #2, #3, #4, #5)
CorrelatedValuePropagation
ConstraintElimination

I have also added folds of idiomatic ways that a 3-way comparison can be expressed to a call to the corresponding intrinsic.

In the backend there are two different ways of expanding the intrinsics: as a nested select (i.e. (x < y) ? -1 : (x > y ? 1 : 0)) or as a subtraction of zero-extended comparisons (zext(x > y) - zext(x < y)). The second option is the default one, but targets can choose to use the first one through a TLI hook.

Results

I think that overall the project was successful and brought a small positive change to LLVM. To demonstrate its impact in a small test case, the following function in C++ that uses the spaceship operator was compiled twice, first with Clang 18.1 and then with Clang built from the main branch of LLVM repository:

#include <compare>std::strong_ordering cmp(unsigned int a, unsigned int b){ return a <=> b;}

With Clang 18.1:

; ====== LLVM IR ======define i8 @cmp(i32 %a, i32 %b) {entry: %cmp.lt = icmp ult i32 %a, %b %sel.lt = select i1 %cmp.lt, i8 -1, i8 1 %cmp.eq = icmp eq i32 %a, %b %sel.eq = select i1 %cmp.eq, i8 0, i8 %sel.lt ret i8 %sel.eq}; ====== x86_64 assembly ======cmp: xor ecx, ecx cmp edi, esi mov eax, 0 sbb eax, eax or al, 1 cmp edi, esi movzx eax, al cmove eax, ecx ret

With freshly built Clang:

; ====== LLVM IR ======define i8 @cmp(i32 %a, i32 %b) {entry: %sel.eq = tail call i8 @llvm.ucmp.i8.i32(i32 %a, i32 %b) ret i8 %sel.eq}; ====== x86_64 assembly ======cmp: cmp edi, esi seta al sbb al, 0 ret

As you can see, the number of instructions in the generated code had gone down considerably (from 8 to 3 excluding ret). Although this isn’t much and is a small synthetic test, it can still make a noticeable impact if code like this is found in a hot path somewhere.

The impact of these changes on real-world code is much harder to quantify. Looking at llvm-opt-benchmark, there are quite a few places where the intrinsics are being used, which suggests that some improvement must have taken place, although it is unlikely to be significant in all but very few cases.

Future Work

There are still many opportunities for optimization in the middle-end, some of which are already known and being worked on at the time of writing this, others are yet to be discovered. I would also like to allow pointers and vectors of pointers to be valid operands for the intrinsics, although that would be quite a minor change. In the backend I would also like to work on better handling of intrinsics in GlobalISel, which is something that I didn’t have enough time for and other members of LLVM community had helped me with.

Acknowledgements

None of this would have been possible without my two amazing mentors, Nikita Popov and Dhruv Chawla, and the LLVM community as a whole. Thank you for helping me on this journey and I am looking forward to working with you in the future.

GSoC 2024: ABI Lowering in ClangIR

Mon, 30 Sep 2024 00:00:00 +0000

ClangIR is an ongoing effort to build a high-level intermediate representation(IR) for C/C++ within the LLVM ecosystem. Its key advantage lies in its abilityto retain more source code information. While ClangIR is making progress, itstill lacks certain features, notably ABI handling. Currently, ClangIR lowersmost functions without accounting for ABI-specific calling convention details.

Goals

The “Build & Run SingleSource Benchmarks with ClangIR - Part 2” Google Summer ofCode 2024 builds on my contributions from GSoC 2023 by addressing one of themain issues I encountered: target-specific lowering. It focuses on extendingClangIRâ€™s code generation capabilities, particularly in ABI-lowering for X86-64.Several tests rely on operations and types (e.g., va_arg calls and complexdata types) that require target-specific information to compile correctly.

The concrete steps to achieve this were:

Implement foundational infrastructure that can scale to multiplearchitectures while adhering to ClangIR design principles such as CodeGenparity, feature guarding, and AST backreferences.
Handle basic calling convention scenarios as a proof of concept tovalidate the foundational infrastructure.
Add lowering for a second architecture to further validate theinfrastructure’s extensibility to multiple architectures.
Unify target-specific ClangIR lowering into the library, as there are afew isolated methods handling target-specific code lowering likecir.va_arg.
Integrate calling convention lowering into the main pipeline to ensurefuture contributions and continued development of this infrastructure.

Contributions

The list of contribution (PRs) can be foundhere.

Target Lowering Library

The most significant contribution of this project was the development of amodular TargetLowering library.This ensures that target-specific MLIR lowering passes can leverage this sharedlibrary for lowering logic. The library also follows ClangIR’s feature guardingprinciples, ensuring that any contributor can refer to the original CodeGen forcontributions, and any unimplemented feature is asserted at specific codepoints, making it easy to track missing functionality.

Calling Convention Lowering Pass

As a proof of concept, the initial development of the TargetLowering libraryfocused on implementing a calling convention loweringpass that targets multiplearchitectures. Currently, ClangIR ignores the target ABI during CodeGen toretain high-level information. For example, structs are not unraveled to improveargument-passing efficiency. ABI-specific LLVM attributes are also ignored. Thispass addresses these issues by properly tagging LLVM attributes and rewritingfunction definitions and calls to handle unraveled structs. This was implementedfor both X86-64 and AArch64,demonstrating the library’s multi-architecture support.

Shortcomings

Target-Specific Lowering Unification

While some target-specific lowering code was moved into the library, it wascopied and pasted rather than properly integrated. This is not ideal forleveraging the libraryâ€™s multi-architecture features.

Inclusion in the Main Pipeline

This is still a work in progress, as the library is not yet mature enough tohandle most pre-existing ClangIR tests. There are also feature guards withunreachable statements for many unimplemented features.

Future Work

Now that there is a base infrastructure for handling target-agnostic totarget-specific CIR code, there is a large amount of future work to be done,including:

Improving DataLayout-related queries using MLIR’s built-in tools.
Implementing calling convention lowering for additional types, such aspointers.
Extending the TargetLowering library to support more architectures.
Unifying remaining target-specific lowering code from other parts of ClangIR.

Acknowledgements

I would like to thank my Google Summer of Code mentors, Bruno Cardoso Lopes andNathan Lanza, for another great GSoC experience. I also want to thank the LLVMcommunity and Google for organizing the program.

GSoC 2024: Statistical Analysis of LLVM-IR Compilation

Mon, 23 Sep 2024 00:00:00 +0000

Welcome! My name is Andrew and I contributed to LLVM through the 2024 Google Summer of Code Program. My project is called Statistical Analysis of LLVM-IR Compilation. The objective of this project is to provide an analysis of how time is spent in the optimization pipeline. Generally, drastic differences in the percentage of time spent by a pass in the pipeline is considered abnormal.

Background

In principle, an LLVM IR bitcode file, or module, contains IR features that determine the behavior of the compiler optimization pipeline. By varying these features, the optimization pipeline, opt, can add significantly to the compilation time or marginally. More specifically, optimizations succeed in less or more time; the user can wait for a microsecond or a few minutes. LLVM compiler developers constantly edit the pipeline, so the performance of these optimizations can vary by compiler version (sometimes significantly).

Having a large IR dataset such as ComPile allows for testing the LLVM compilation pipeline on a varied sample of IR. The size of this sample is sufficient to determine outlying IR modules. By identifying and examining such files using utilities which are being added to the LLVM IR Dataset Utils Repo, the causes of unexpected compilation times can be determined. Developers can then modify and improve the compilation pipeline accordingly.

Summary of Work

The utilities added in PR37 are intended to write each IR module to a tar file corresponding to a programming language. Each file written to the tar files is indexed by its location in the HF dataset. This allows easy identification of files for tools which can be used for data extraction and analysis in the shell, notably clang. Tar file creation allows for potentially using less storage space then downloading the HF dataset to disk, and it allows code to be written which does not depend on the Python interpreter to load the dataset for access.

The Makefile from PR36 is responsible for carrying out the data collection. This data includes text segment size, user CPU instruction counts during compile time (analogous to time), IR feature counts sourced from the LLVM pass print<func-properties>, and maximum relative time pass names and percentage counts. The data can be extracted in parallel or serially and is stored in a CSV file.

An important data collection command in the Makefile is clang -w -c -ftime-report $(lang)/bc_files/[email protected] -o /dev/null. The output from the command is large, but the part of interest is the first Pass execution timing report:

===-------------------------------------------------------------------------=== Pass execution timing report===-------------------------------------------------------------------------=== Total Execution Time: 2.2547 seconds (2.2552 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 2.1722 ( 96.5%) 0.0019 ( 47.5%) 2.1741 ( 96.4%) 2.1745 ( 96.4%) VerifierPass 0.0726 ( 3.2%) 0.0000 ( 0.0%) 0.0726 ( 3.2%) 0.0726 ( 3.2%) AlwaysInlinerPass 0.0042 ( 0.2%) 0.0015 ( 39.2%) 0.0058 ( 0.3%) 0.0058 ( 0.3%) AnnotationRemarksPass 0.0014 ( 0.1%) 0.0005 ( 13.3%) 0.0019 ( 0.1%) 0.0020 ( 0.1%) EntryExitInstrumenterPass 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) CoroConditionalWrapper 2.2507 (100.0%) 0.0039 (100.0%) 2.2547 (100.0%) 2.2552 (100.0%) Total

A user can visually see the distribution of these passes by using a profiling tool for .json files. The .json file for a given bitcode file is obtained by clang -c -ftime-trace <file>.

The visualization of this output can be filtered to the passes of interest as in the following image:

The CoroConditionalWrapper pass is accounted by the “Total CoroConditionalWrapper” block. Clearly, that pass takes a far smaller amount of time than the others, as accounted for by the pass execution timing report. However, instead of seeing the pass as an insignificant percentage of time, the visualization allows for additional comparisons of the relative timings of each pass. The example image has the optimization passes of interest selected, but the .json file provides information on the entire compilation pipeline as well. Thus, the entire pipeline execution flow can be visualized.

Current Status

Currently, there are three PRs that require approval to be merged. There has been ongoing discussion on their contents, so few steps should be left to merge them.In the current state, users of the utilities in PR38 should be able to readily reproduce the quantitative results I had obtained for my GSoC midterm presentation graphs. Users can easily perform outlier analysis as well on the IR files (excluding Julia IR). Some of the results include the following:

Scatter Plot of C IR Files:

Table of outliers for C IR files:

Future Work

It was discussed in PR 37 to consolidate the tar file creation into the dataset file writer Python script. This is a feature I wish to implement in order to speed up the tar file creation process by having the bitcode files written from memory to the tar instead of from memory, to disk, to tar.

As mentioned, Julia IR was not analyzed. Modifying the scripts to include Julia IR results is desirable to make complete use of the dataset.Adding additional documentation for demonstration-of-use purposes could help clarify ways to use the tools.

Additionally, outlier analysis can be expanded upon by using more advanced outlier detection methods. Not all the data collected in the CSV files was used, so using those extra features–in particular the print<func-properties> pass–can allow for improved accuracy in outlier detection.

Acknowledgements

I would like to thank my mentors Johannes Doerfert and Aiden Grossman for their constant support during and prior to the GSoC program. Additionally, I would like to acknowledge the work of the LLVM Foundation admins and the GSoC admins.

GSoC 2024: Reviving NewGVN

Mon, 16 Sep 2024 00:00:00 +0000

This summer I participated in GSoC under the LLVM Compiler Infrastructure. The goal of the project was to improve the NewGVN pass so that it can replace GVN as the main value numbering pass in LLVM.

Background

Global Value Numbering (GVN) consists of assigning value numbers such that instructions with the same value number are equivalent. NewGVN was introduced in 2016 to replace GVN. We now highlight a few aspects in which NewGVN is better than GVN.

A key advantage of NewGVN over GVN is that it is complete for loops, while GVN is only complete for acyclical code. NewGVN is complete for loops because when it first processes loops, it assumes that only the first iteration will be executed, later corroborating these assumptionsâ€”this is known as the optimistic assumption. In practice, the optimistic assumption boils down to assuming that backedges are unreachable and, consequently, that when evaluating phi instructions, the values carried by them can be ignored. For instance, in the example below, %a is optimistically evaluated to 0. This leads to evaluating %c to %x, which in turn leads to evaluating %a.i to 0. At this point, there are two possibilities: either the assumption was correct and the loop actually only executes once, and the value numbers computed so far are correct, or the instructions in the loop need to be reevaluated. Assume, for this example, that NewGVN could not prove that only one iteration is executed. Then %a once again evaluates to 0, and all other registers also evaluate to the same. Thanks to the optimistic assumption, we were able to discover that %a is loop-invariant and, moreover, that it is equal to 0.

define i32 @optimistic(i32 %x, i32 %y){entry: br label %looploop: %a = phi i32 [0, %entry], [%a.i, %loop] ... %c = xor i32 %x, %a %a.i = sub i32 %x, %c br i1 ..., label %loop,label %exitexit: ret i32 %a}

On the other hand, GVN fails to detect this equivalence because it would pessimistically evaluate %a to itself, and the previously described evaluation steps would never take place.

Another advantage of NewGVN is the value numbering of memory operations using MemorySSA. It provides a functional view of memory where instructions that can modify memory produce a new memory version, which is then used by other memory operations. This greatly simplifies the detection of redundancies among memory operations. For example, two loads of the same type from equivalent pointers and memory versions are trivially equivalent.

define i32 @foo(i32 %v, ptr %p) {entry:; 1 = MemoryDef(liveOnEntry) store i32 %v, ptr %p, align 4; MemoryUse(1) %a = load i32, ptr %p, align 4; MemoryUse(1) %b = load i32, ptr %p, align 4; 2 = MemoryDef(1) call void @f(i32 %a); MemoryUse(2) %c = load i32, ptr %p, align 4 %d = sub i32 %b, %c ret i32 %d}

In the example above (annotated with MemorySSA), %a and %b are equivalent, while %c is not. All three loads are of the same type from the same pointer, but they don’t all load from the same memory state. Loads %a and %b load from the memory defined by the store (Memory 1), while %c loads from the memory defined by the function call (Memory 2). GVN can also detect these redundancies, but it relies on the more expensive and less general MemoryDependenceAnalysis.

Despite these and other improvements NewGVN is still not widely used, mainly because it lacks partial redundancy elimination (PRE) and because it is bug-ridden.

Implementing PRE

Our main contribution was the development of a PRE stage for NewGVN (found here). Our solution relied on generalizing Phi-of-Ops. It performs a special case of PRE where the instruction depends on a phi instruction, and an equivalent value is available on every reaching path. This is achieved in two steps: phi-translation and phi-insertion.

Phi-translation consists of evaluating the original instruction in the context of each of its block’s predecessors. Phi operands are replaced by the value incoming from the predecessor. The value is available in the predecessor if the translated instruction is equivalent to a constant, function argument, or another instruction that dominates the predecessor.

Phi-insertion occurs after phi-translation if the value is available in every predecessor. At that point, a phi of the equivalent values is constructed and used to replace the original instruction. The full process is illustrated in the following example.

Our generalization eliminated the need for a dependent phi and introduced the ability to insert the missing values in cases where the instruction is partially redundant. To prevent increases in code size (ignoring the inserted phi instructions), the insertion is only made if itâ€™s the only one required. The full process is illustrated in the following example.

Integrating PRE into the existing framework also allowed us to gain loop-invariant code motion (LICM) for free. The optimistic assumption, combined with PRE, allows NewGVN to speculatively hoist instructions out of loops. On the other hand, LICM in GVN relies on using LoopInfo and can only handle very specific cases.

Missing Features

The two main features our PRE implementation lacks are critical edge splitting and load coercion. Critical edge splitting is required to ensure that we do not insert instructions into paths where they won’t be used. Currently, our implementation simply bails in such cases. Load coercion allows us to detect equivalences of loaded values with different types, such as loads of i32 and float, and then coerce the loaded type using conversion operations.

The difficulty in implementing these features is that NewGVN is designed to perform analysis and transformation in separate steps, while these features involve modifying the function during the analysis phase.

Results

We evaluated our implementation using the automated benchmarking tool Phoronix Test Suite from which we selected a set of 20 C/C++ applications (listed below).


aircrack-ng	encode-flac	luajit	scimark2
botan	espeak	mafft	simdjson
zstd	fftw	ngspice	sqlite-speedtest
crafty	john-the-ripper	quantlib	tjbench
draco	jpegxl	rnnoise	graphics-magick

The default -O2 pipeline was used. The only change betweeen compilations was the value numbering pass used.

Despite the missing features, we observed that our implementation, on average, performs 0.4% better than GVN. However, it is important to mention that our solution hasn’t been fine-tuned to consider the rest of the optimization pipeline, which resulted in some cases where our implementation regressed compared to both GVN and the existing NewGVN. The most severe case was with jpegxl, where our implementation, on average, performed 10% worse than GVN. It’s important to note that this was an outlier; excluding jpegxl, most regressions were at most 2%. Unfortunately, due to time constraints, we were unable to study these cases in more detail.

Future Work

In the future, we plan to implement the aforementioned missing features and fine-tune the heuristics for when to perform PRE to prevent the regressions discussed in the results section. Once these issues are addressed, we’ll upstream our implementation, bringing us a step closer to reviving NewGVN.

GSoC 2024: Compile GPU kernels using ClangIR

Mon, 09 Sep 2024 00:00:00 +0000

Hello everyone! I’m 7mile. My GSoC project this summer is Compile GPU kernels using ClangIR. It’s been an exciting journey in compiler development, and I’m thrilled to share the progress and insights gained along the way here.

Background

The ClangIR project aims to establish a new IR for Clang, built on top of MLIR. As part of the ongoing effort to support heterogeneous programming models, this project focuses on integrating OpenCL C language support into ClangIR. The ultimate goal is to enable the compilation of GPU kernels written in OpenCL C into LLVM IR targeting the SPIR-V architecture, laying the groundwork for future enhancements in SYCL and CUDA support.

What We Did

Our work involved several key areas:

Address Space Support: One of the fundamental tasks was teaching ClangIR to handle address spaces, a vital feature for languages like OpenCL. Initially, we considered mimicking LLVM’s approach, but this proved inadequate for ClangIR’s goals. After thorough discussion and an RFC, we implemented a unified address space design that aligns with ClangIR’s objectives, ensuring a clean and maintainable code structure.
OpenCL Language and SPIR-V Target Integration: We extended ClangIR to support the OpenCL language and the SPIR-V target. This involved enhancing the pipeline to accommodate the latest OpenCL 3.0 specification and implementing hooks for language-specific and target-specific customizations.
Vector Type Support: OpenCL vector types, a critical feature for GPU programming, were integrated into ClangIR. We leveraged ClangIR’s existing cir.vector type to generate the necessary code, ensuring consistent compilation results.
Kernel and Module Metadata Emission: We added support for emitting OpenCL kernel and module metadata in ClangIR, a necessary step for proper integration with the SPIR-V target. This included the creation of structured attributes to represent metadata, following MLIR’s preferences for well-defined structures.
Global and Static Variables with Qualifiers: We implemented support for global and static variables with qualifiers like global, constant, and local, ensuring that these constructs are correctly represented and lowered in the ClangIR pipeline.
Calling Conventions: We adjusted the calling conventions in ClangIR to align with SPIR-V requirements, migrating from the default cdecl to SPIR-V-specific conventions like SpirKernel and SpirFunction. This also enables most OpenCL built-in functions like barrier and get_global_id.
User Experience Enhancements: Finally, we ensured that the end-to-end kernel compilation experience using ClangIR was smooth and intuitive, with minimal manual intervention required.

Results

The project successfully met its primary goals. OpenCL kernels from the Polybench-GPU benchmark suite can now be compiled using ClangIR into LLVM IR for SPIR-V. All patches have been merged into the main ClangIR repository, and the projectâ€™s progress has been well-documented in the overview issue. I believe the work not only advanced OpenCL support but also laid a solid foundation for future enhancements, such as SYCL and CUDA support in ClangIR.

We have successfully compiled and executed all 20 OpenCL C benchmarks from the polybenchGpu repository, passing the built-in result validation. Please refer to our artifact evaluation repository for detailed instructions on how to experiment with our work.

Future Works

As we look forward, there are two key areas that require further development:

Function Attribute Consistency: For example, the convergent function attribute is crucial for preventing misoptimizations in SIMT languages like OpenCL. ClangIR currently lacks this attribute, which could lead to issues in parallel computing contexts. Addressing this is a priority to ensure correct optimization behavior.
Support for OpenCL Built-in Types: Another critical area for future work is the support for OpenCL built-in types, such as pipe and image. These types are essential for handling data streams and image processing tasks in various specialized OpenCL applications. Supporting these types will significantly enhance ClangIR’s adherence to the OpenCL standard, broadening its applicability and ensuring better compatibility with a wide range of OpenCL programs.

Acknowledgements

This project would not have been possible without the guidance and support of the LLVM community. I extend my deepest gratitude to my mentors, Julian Oppermann, Victor LomÃ¼ller, and Bruno Cardoso Lopes, whose expertise and encouragement were instrumental throughout this journey. Additionally, I would like to thank Vinicius Couto Espindola for his collaboration on ABI-related work. This experience has been immensely rewarding, both technically and in terms of community engagement.

Appendix

GSoC 2024: Half-precision in LLVM libc

Sat, 31 Aug 2024 00:00:00 +0000

C23 defines new floating-point types, such as _Float16, which corresponds tothe binary16 format from IEEE Std 754, also known as “half-precision,” or FP16.C23 also defines new variants of the C standard library’s math functionsaccordingly, such as fabsf16 to get the absolute value of a _Float16.

The “Half-precision in LLVM libc” Google Summer of Code 2024 project aimed toimplement these new _Float16 math functions in LLVM libc, making it the firstknown C standard library implementation to implement these C23 functions.

We split math functions into two categories: basic operations and higher mathfunctions. The current implementation status of math functions in LLVM libc canbe viewed at https://libc.llvm.org/math/index.html#implementation-status.

The exact goals of this project were to:

Setup generated headers properly so that the _Float16 type and _Float16functions can be used with various compilers and architectures.
Add generic implementations of _Float16 basic operations for supportedarchitectures.
Add optimized implementations of _Float16 basic operations for specificarchitectures using special hardware instructions and compiler builtinswhenever possible.
Add generic implementations of as many _Float16 higher math functions aspossible. We knew we would not have enough time to implement all of them.

Work done

The _Float16 type can now be used in generated headers, and declarations of_Float16 math functions are generated with #ifdef guards to enable themwhen they are supported.
- https://github.com/llvm/llvm-project/pull/93567
All 70 planned _Float16 basic operations have been merged.
- https://github.com/llvm/llvm-project/issues/93566
The _Float16, float and double variants of various basic operationshave been optimized on certain architectures.
Out of the 54 planned _Float16 higher math functions, 8 have been mergedand 9 have an open pull request.
- https://github.com/llvm/llvm-project/issues/95250

We ran into unexpected issues, such as:

Bugs in Clang 11, which is currently still supported by LLVM libc and used inpost-commit CI.
Some post-commit CI workers having old versions of compiler runtimes that aremissing some floating-point conversion functions on certain architectures.
Inconsistent behavior of floating-point conversion functions across compilerruntime vendors (GCC’s libgcc and LLVM’s compiler-rt) and CPU architectures.

Due to these issues, LLVM libc currently only enables all _Float16 functionson x86-64 Linux. Some were disabled on AArch64 due to Clang 11 bugs, and allwere disabled on 32-bit Arm and on RISC-V due to issues with compiler runtimes.Some are not available on GPUs because they take _Float128 arguments, and the_Float128 type is not available on GPUs.

There is work in progress to work around issues with compiler runtimes by usingour own floating-point conversion functions.

Work left to do

Implement the remaining _Float16 higher math functions.
Enable the _Float16 math functions that are disabled on AArch64 once LLVMlibc bumps its minimum supported Clang version.
Enable _Float16 math functions on 32-bit Arm and on RISC-V once issues withcompiler runtimes are resolved.

Acknowledgements

I would like to thank my Google Summer of Code mentors, Tue Ly and Joseph Huber,as well as other LLVM maintainers I interacted with, for their help. I wouldalso like to thank Google for organizing this program.

GSoC 2024: GPU Libc Benchmarking

Fri, 09 Aug 2024 00:00:00 +0000

Hey everyone! My name is James and I worked on LLVM this summer through GSoC. My project is called GPU Libc Benchmarking. The main objective of this project was to develop microbenchmarking infrastructure for libc on the GPU.

Background

The LLVM libc project was designed as an alternative to glibc that aims to be modular, configurable, and sanitizer-friendly. Currently, LLVM libc is being ported to Nvidia and AMD GPUs to give libc functionality (e.g. printf(), malloc(), and math functions) on the GPU. As of March 2024, programs can use GPU libc in offloading languages (CUDA, OpenMP) or through direct compilation and linking with the libc library.

What We Did

During this project, we developed a microbenchmarking framework that is directly compiled for and run on the GPU, using libc functions to display output to the user. As this was a short project (90 hours), we mostly focused on developing the infrastructure and writing a few example usages (isalnum(), isalpha(), and sin()).

Our benchmarking infrastructure is based on Google Benchmark and measures the average cycles, minimum, maximum, and standard deviation of each benchmark. Each benchmark is run for multiple iterations to stabilize the results. Benchmark writers can measure against vendor implementations of libc functions by passing specific linking flags to the benchmarkâ€™s CMake portion and registering the corresponding vendor function from the benchmark itself.

Below is an example of our benchmarking infrastructure’s output for sinf()

Benchmark | Cycles | Min | Max | Iterations | Time / Iteration | Stddev | Threads |----------------------------------------------------------------------------------------------------------Sinf_1 | 764 | 369 | 2101 | 273 | 7 us | 323 | 32 |Sinf_128 | 721 | 699 | 744 | 5 | 913 us | 16 | 32 |Sinf_1024 | 661 | 650 | 689 | 9 | 7 ms | 31 | 32 |Sinf_4096 | 666 | 663 | 669 | 5 | 28 ms | 28 | 32 |SinfTwoPi_1 | 372 | 369 | 632 | 70 | 7 us | 39 | 32 |SinfTwoPi_128 | 379 | 379 | 379 | 4 | 895 us | 0 | 32 |SinfTwoPi_1024 | 335 | 335 | 338 | 5 | 7 ms | 20 | 32 |SinfTwoPi_4096 | 335 | 335 | 335 | 4 | 28 ms | 0 | 32 |SinfTwoPow30_1 | 371 | 369 | 510 | 70 | 7 us | 17 | 32 |SinfTwoPow30_128 | 379 | 379 | 379 | 4 | 894 us | 0 | 32 |SinfTwoPow30_1024 | 335 | 335 | 338 | 5 | 7 ms | 20 | 32 |SinfTwoPow30_4096 | 335 | 335 | 335 | 4 | 28 ms | 0 | 32 |SinfVeryLarge_1 | 477 | 369 | 632 | 70 | 7 us | 58 | 32 |SinfVeryLarge_128 | 487 | 480 | 493 | 5 | 900 us | 14 | 32 |SinfVeryLarge_1024 | 442 | 440 | 447 | 5 | 7 ms | 18 | 32 |SinfVeryLarge_4096 | 441 | 441 | 442 | 4 | 28 ms | 14 | 32 |

Users can register benchmarks similar to Google Benchmark, using a macro:

uint64_t BM_IsAlnumCapital() { char x = 'A'; return LIBC_NAMESPACE::latency(LIBC_NAMESPACE::isalnum, x);}BENCHMARK(LlvmLibcIsAlNumGpuBenchmark, IsAlnumCapital, BM_IsAlnumCapital);

Results

This project met its major goal of creating microbenchmarking infrastructure for the GPU. However, the original scope of this proposal included a CPU component that would use vendor tools to measure GPU kernel properties. However, this was removed after discussion with the mentors due to technical obstacles in offloading specific kernels to the GPU that would require major changes to other parts of the code.

Future Work

As this was a short project (90 hours), we only focused on implementing the microbenchmarking infrastructure. Future contributors can use the benchmarking infrastructure to add additional benchmarks. In addition, there are improvements to microbenchmarking infrastructure that could be added, such as more options for user input ranges, better random distributions for math functions, and a CPU element that can launch multiple kernels and compare results against functions running on the CPU.

The existing code can be found in the LLVM repo.

Acknowledgements

This project would not have been possible without my amazing mentor, Joseph Huber, the LLVM Foundation admins, and the GSoC admins.

Links

LLVM Google Summer of Code 2024 & 2023

Thu, 14 Mar 2024 00:00:00 +0000

The LLVM organization was accepted to participate in Google Summer of Code in 2024. Soon, prospective participants will begin submitting their project proposals, and mentors will review them to select those who will spend a significant amount of time this year contributing to various parts of LLVM.

But first, let’s look back and see what we had in 2023. The Google Summer of Code 2023 was very successful for the LLVM project. Overall, we received 54 proposals for 24 open projects. Out of this set of proposals, 20 projects were successfully completed and covered many different aspects of LLVM and its subprojects.

ExtractAPI while building by Ankur Saini, mentored by Daniel Grumberg
WebAssembly Support for clang-repl by Anubhab Ghosh, mentored by Vassil Vassilev and Alexander Penev
Modules Build Daemon: Build System Agnostic Support for Explicitly Built Modules by Connor Sughrue, mentored by Jan Svoboda, Michael Spencer
[Interactive MLIR query tool to make exploring the IR easier] (https://summerofcode.withgoogle.com/archive/2023/projects/bdePp9VD) by Devajith Valaparambil Sreeramaswamy, mentored by Jacques Pienaar
Improving Compile Times by Dhruv Chawla, mentored by Nikita Popov
Adding C++ Support to Clang’s ExtractAPI by Erick Velez, mentored by Daniel Grumberg
Optimizing MLIRâ€™s Presburger library by gilsaia, mentored by Kunwar Grover
Adapting IR Load Semantics to Freeze All or Freeze Only Uninitialized Data by John McIver, mentored by Nuno Lopes
Addressing Rust optimization failures in LLVM by Kohei Asano, mentored by Nikita Popov
Tutorial development with clang-repl by Krishna Narayanan, mentored by Vassil Vassilev
Fix Handling of Undefined Behavior in NewGVN by Manuel Brito, mentored by Nuno Lopes
Map LLVM values to corresponding source-level expressions by phyBrackets, mentored by Satish Guggilla and Karthik Senthil
Machine Learning Guided Ordering of Compiler Optimization Passes by Puneeth A R, mentored by Tarindu Jayatilaka, Johannes Doerfert, and Mircea Trofin
Patch based test coverage for quick test feedback by ShivamGupta123, mentored by Henrik Olsson
Re-optimization using JITLink by Sunho Kim, mentored by Vassil Vassilev, Stefan GrÃ¤nitz, and Lang Hames
Improvements in Clang Diagnostics by Takuya Shimizu, mentored by Timm BÃ¤der
Build & Run SingleSource Benchmarks with ClangIR by Vinicius Espindola, mentored by Bruno Cardoso Lopes and Nathan Lanza
Better Performance Models for MLGO Training by Viraj Shah, mentored by Mircea Trofin, Aiden Grossman, and Ondrej Sykora
Enhancing llvm-cov to Generate Hierarchical Coverage Reports by Yuhao Gu, mentored by Petr Hosek and Gulfem Savrun Yeniceri
Autocompletion in Clang-REPL by Yuquan Fu, mentored by Vassil Vassilev

Some projects also provided detailed end-of-project reports or project diaries that are outstanding on their own:

Tutorial Development with Clang-Repl
Diagnostic Improvements in Clang 17
Improving Compile Times
Addressing Rust optimization failures in LLVM
Map LLVM Values to corresponding source level expression
Another step forward towards interactive programming - covers Autocompletion in Clang-REPL, WebAssembly Support for Clang-Repl, Re-optimization using JITLink and Tutorial development with clang-repl projects.

GSoC 2024

With a successful end to 2023, the LLVM Project is excited to participate in GSOC 2024. If you are interested in participating, here are some guidelines:

1. Project ideas

Please take a look on list of projects at Open Projects page. Projects also have topics below on LLVM Discourse having #gsoc24 tag, so you can ask mentors about details of the project, skills required, etc.

2. Way to submitting a proposal

We encourage you to discuss your proposal before submitting to GSoC system. Having your proposal discussed ensures that your proposal will be well aligned with the project. Please do not hijack other threads (e.g. with mentor Q&A) and create a separate new thread to discuss your proposal. The ideal proposal will contain:

A descriptive title
Information about you, including contact information. Please do not forget to include:
- Your prior compiler and compiler-related experience, if any (e.g. studies at the University, prior contributions)
- Whether you have any prior contributions to LLVM. If yes, please provide links to these contributions.
- Your past open source participation and contributions, if any
- Your knowledge of programming languages (e.g. C, C++, Python, Rust, etc.) and your estimate of your level of experience
Information about your proposed project. This should be fairly detailed and include a timeline.
Information about other commitments that might affect your ability to work during the GSoC period (exams, classes, holidays, other jobs, weddings, etc.). Also, if the project allows both medium- and large-size participation, indicate the intended size of the project and the timeframe of your participation.

3. Useful links

LLVM Contribution Guidelines LLVM Developer Policy GSoC channel on LLVM Discord Other documents LLVM Community Code of Conduct GSoC Contributor Guide Advice for People Applying for GSoC GSoC Program Website LLVM Office Hours

4. Deadlines

Submission to GSoC system opens on March 18th at 18:00 UTC.
Submission to GSoC system ends on April 2nd at 18:00 UTC.
Results to be announced on May 1st at 18:00 UTC.

Welcome to the 20th Google Summer of Code!

Another step forward towards interactive programming

Sun, 31 Dec 2023 00:00:00 +0000

The Compiler Research team is pleased to announce the successful completionof another round of internships focused on enhancements in interactiveprogramming, specifically in relation to the Clang-REPL component in LLVM.

The Compiler Research team includes researchers located at Princeton Universityand CERN. Our primary goal is best described as follows:

To establish a proficient workflow in LLVM, where interactive development inC++ is possible, and exploratory C++ becomes an accessible experience to awider audience.

Following are some notable contributions by our interns this year.

Yuquan Fu - Autocompletion in Clang-REPL

Clang-Repl allows developers to program in C++ interactively with a REPLenvironment. However, it was missing the ability to suggest code completion orauto-complete options for user input, which can be time-consuming and prone totyping errors.

With this code completion system, users can either complete their input quicklyor see a list of valid completion candidates. The code completion is alsocontext-aware, providing semantically relevant results based on the currentposition and input on the current line.

Mentors: Vassil Vassilev (Princeton.edu) & David Lange (Princeton.edu)

Project Details: Autocompletion in Clang-REPL

Funding: Google Summer of Code 2023

Example â€“ avoiding tedious typing

clang-repl> struct WhateverMeaningfulLoooooooooongName{ int field;};clang-repl> Wh<tab>

With code completion, hitting tab completes the entity name:

clang-repl> WhateverMeaningfulLoooooooooongName

For implementation details, please see the respective slides and theblog.

Anubhab Ghosh - WebAssembly Support for Clang-Repl

The Xeus Framework enables accessing Clang-REPL (an interpreter that JITcompiles C++ code into native code) in a web browser, using Jupyter. However,this shifts the computational load to the server.

A more scalable approach is to use WebAssembly. It allows sandboxed executionof native (e.g. C/C++/Rust) programs compiled to an intermediate bytecode atcloser to native speeds. The idea is to run clang-repl within WebAssembly andgenerate JIT-compiled WebAssembly code and execute it on the client side.

However, this comes with some challenges (e.g., code in WebAssembly isimmutable, which is unacceptable for JIT).

Solution: To address the code immutability issue, a new WebAssembly module iscreated at each iteration of the REPL loop. Initially, a precompiled modulecontaining the Standard C/C++ libraries, LLVM, Clang, and wasm-ld is sent tothe browser, which runs the interpreter and compiles the user code.

Since we cannot call Interpreter::Execute() to execute the module (due toJITLink reliance), the LLVM WebAssembly backend is used manually to produce anobject file. This file is then passed to the WebAssembly version of LLD(wasm-ld) to turn it into a shared library which is written to the virtual filesystem of Emscripten. The dynamic linking facilities of Emscripten can be usedto load this library.

Mentors: Vassil Vassilev (Princeton.edu) & Alexander Penev (Uni-Plovdiv.bg)

Project Details: WebAssembly Support for Clang-Repl

Funding: Google Summer of Code 2023

Example:

SDL_Init(SDL_INIT_VIDEO);SDL_Window *window;SDL_Rendered *renderer;SDL_CreateWindowAndRenderer (300, 300, 0, &window, &renderer);

This should connect to a simple black canvas. Next, we can draw things into it.

SDL_SetRenderDrawColor(renderer, 0x80, 0x00, 0x00, 0xFF);SDL_Rect rect3 = {.x = 20, .y = 20, .w = 150, .h = 100};SDL_RenderFillRect(rendered, &rect3); SDL_SetRenderDrawColor(renderer, 0x00, 0x80, 0x00, 0xFF);SDL_Rect rect4 = {.x = 40, .y = 40, .w = 150, .h = 100};SDL_RenderFillRect(rendered, &rect4); SDL_RenderPresent(renderer);

The output should look something like this:

Sunho Kim - Re-optimization using JITLink

In order to support re-optimization, the JITLink API was extended by adding thecross-architecture stub creation API. This API works in all platforms andarchitectures that JITLink supports and through this we can create theredirectable stubs by using JITLink.

Once the re-optimization API was developed, it was time to actually implementre-optimization. A new layer was introduced to support re-optimization of IRmodules. There were many abstraction levels where redirection could beimplemented, but we ended up doing it at IR level since that brings a lot ofre-optimization techniques to be implemented easily by transforming IRdirectly. From an API perspective, the most flexible abstraction level to dothis may be at the FrontEnd AST level.

Clang-Repl relies on LLJIT to do JIT-related tasks. Enabling re-optimizationfor LLJIT also helped enable it in Clang-Repl. However, there were minorchallenges (e.g., mismatch in what clang-repl expects from how the runtimeexecutes the static initializers and how ELF orc runtime runs it). Possiblesolutions for these are in discussion (e.g., adding a new dl function).Nevertheless, we now have a real-world experimental environment where we cantest new re-optimization techniques and perform benchmarks to see if they areuseful.

Finally, based on the above infrastructure, profile guided optimization is nowpossible (by transforming the IR module). There are still some enhancementspending before the code is fully upstreamed, but the current code achievesinstrumentation on the orc-runtime side, which simplifies implementation by alot.

Mentors: Vassil Vassilev (Princeton.edu) & Lang Hames/ lhames (Apple)

Project Details: Re-optimization using JITLink

Funding: Google Summer of Code 2023

Example: Doing the -O2 optimization if function was called more than 10 times

The following example builds a PassManager using the LLVM library and then runsthe optimization pipeline.

static Error reoptimizeTo02(ReOptimizeLayer &Parent, ReOptMaterializationUnitID MUID, unsigned Curverison, ResourceTrackerSP OldRT, ThreadSafeModule &TSM) { TSM.withModuleDo([&]{llvm::Module &M) { auto PassManager = buildPassManager(); PassManager.run(M); }); return Error::success();}ReOptLayer ->setReoptimizeFunc(reoptimizeTo02);ReOptLayer ->setAddProfileFunc(reoptimizeIfCallFrequent);

For more examples, please see the LLVM-JITLink-COFF-Example repo.

Krishna Narayanan - Tutorial development with clang-repl

Open Source documentation is often a neglected area in the software lifecycle.Specifically, this project targeted helping contributors by documenting howthey can set up respective environments on their local machines to contributeto the code and documentation of the respective project. These environmentswere set up locally, tested and then the setup methodology was updated in therelevant documentation.

Besides other compiler research technologies, write-ups were also added to LLVM(specifically the Clang-Repl documentation) as part of this project. Usageexamples were also added.

Mentors: Vassil Vassilev (Princeton.edu) & David Lange (Princeton.edu)

Project Details: Tutorial development with clang-repl

Funding: Google Summer of Code 2023

Example

// Classes and Structuresclang-repl> #include <iostream>clang-repl> class Rectangle {int width, height; public: void set_values (int,int);\clang-repl... int area() {return width*height;}};clang-repl> void Rectangle::set_values (int x, int y) { width = x;height = y;}clang-repl> int main () { Rectangle rect;rect.set_values (3,4);\clang-repl... std::cout << "area: " << rect.area() << std::endl;\clang-repl... return 0;}clang-repl> main();area: 12

Tools for Learning LLVM TableGen

Thu, 07 Dec 2023 00:00:00 +0000

TableGenis a language used within the LLVM project for generating a variety of files,when manual maintenance would be very difficult.

For example, it is used to define all of the instructions that can be used on aparticular architecture. The information is defined in TableGen and we canproduce many things based on that single source file. C++ code, documentation,command line options, and so on.

TableGen has been in existence sincebeforethe first official release of LLVM, over 20 years ago.

Today in the LLVM project repository there areover a thousand TableGen source files totalling over 500,000 lines of code.Making it the 5th most popular language in the repository.

Language	files	blank	comment	code
C++	29642	958542	1870101	5544445
C/C++ Header	11844	316806	499845	1486165
C	10535	259900	1603594	1011269
Assembly	10694	478035	1222315	820236
TableGen	1312	94112	83616	580289

(Counted fromthis commit,rest of table omitted)

With projects such as MLIRembracing TableGen,it is only going to grow. So if you are contributing to LLVM, you will encounterit at some point.

Which might be a problem as TableGen only exists within LLVM. Unlike a languagesuch as C++, TableGen does not have a large array of resources.

So, as well as joining a new project, you also need to learn a newDomain Specific Language (DSL). You did not come to LLVM to learn a DSL, youprobably came here to write a compiler.

I cannot say when this problem might be solved, but the situation is not asbleak as it appears. There have been big improvements in TableGen toolsrecently, which means you can put more of your energy into the goals thatbrought you to LLVM in the first place.

A Brief Introduction to TableGen

Imagine you wanted to represent the registers of an architecture. I am going touse Arm’s AArch64 in particular here.

You could describe them in TableGen as:

$ cat register.tdclass Register<int _size, string _alias=""> { int size = _size; string alias = _alias;}// 64 bit general purpose registers are X<N>.def X0: Register<8> {}// Some have special alternate names.def X29: Register<8, "frame pointer"> {}// Some registers omitted...

By default, the TableGen compiler llvm-tblgen creates â€œrecordsâ€ - which areshown below.

$ ./bin/llvm-tblgen register.td------------- Classes -----------------class Register<int Register:_size = ?, string Register:_alias = ""> { int size = Register:_size; string alias = Register:_alias;}------------- Defs -----------------def X0 { // Register int size = 8; string alias = "";}def X29 { // Register int size = 8; string alias = "frame pointer";}

This is the intermediate representation (IR) of the TableGen compiler, similarto LLVM’s “LLVM IR”.

When using LLVM you would select a “target” which is the processor architectureyou want to generate instructions for. TableGen’s equivalent is a “backend”.These backends do not generate instructions, but instead output a format forthat backend’s specific use case.

For example, there is a backend that generates C++ code forsearching data tables. Other examples areC header files and reStructuredTextdocumentation.

 TableGen source | +--llvm-tblgen----------------|------------------------+ | v | | +----- Expanded records ----+ | | | | | | v v | | +-------------------------+ +-------------------+ | | | --gen-searchable-tables | | Other backends... | | | +-------------------------+ +-------------------+ | | | | | +--------------|---------------------------|-----------+ v v .inc file with C++ code Other output formats... for table searching.

The main compiler is llvm-tblgen, but there are others specific tosub-projects of LLVM. For example clang-tblgen and lldb-tblgen. The onlydifference is the backends included in each one, the language is the same.

You might take your register definitions and produce C++ code to initialise themin some kind of bootloader. Perhaps you also document it and produce a diagramof the process. With enough backends, you could do all that from the sameTableGen source code.

You would write these backends either in C++ within the TableGen compiler,or as an external backend using the compiler’sJSON output (--dump-json). So you can useany language with a JSON parser (such asPython).

There is TableGen and There Are Things Built With TableGen

This is more a mindset than a tool. It is summed up best by a quote from thedocumentation:

Despite being very generic, TableGen has some deficiencies that have beenpointed out numerous times. The common theme is that, while TableGen allowsyou to build domain specific languages, the final languages that you createlack the power of other DSLs, which in turn increase considerably the size andcomplexity of TableGen files.
At the same time, TableGen allows you to create virtually any meaning of thebasic concepts via custom-made backends, which can pervert the original designand make it very hard for newcomers to understand the evil TableGen file.â€

This means that you will be tackling TableGen, and things built with TableGen.Which are often more complicated than the language.

It is like learning C++ and struggling to use Boost.Someone might say to you, â€œBoost is not required, why not remove it and saveyourself the hassle?â€. As someone new to C++, you might not be aware of theboundary between the two of them.

Of course this does not help you too much if the project you want to contributeto uses Boost. You are stuck dealing with both. In LLVM terms, the TableGenlanguage and the backends that consume it are a package deal.

I mention this so that you can draw a distinction between not understandingone or the other. Knowing which one is confusing you is a big advantageto finding help.

For any task there are probably one or two “things built with TableGen” that youneed to understand and even then, not entirely.

Do not think that your TableGen journey must end with understanding all the waysit is used. That is possible, but it is not required, and hardly anyone learnseverything. Instead put your energy into the things that really interest you.

Compiler Explorer

Of course we have TableGen in Compiler Explorer! Is a language even real if it isnot in Compiler Explorer?

(Of course it is, but if your favourite language is not there, Compiler Explorerhas excellent documentationand friendly maintainers)

Compiler Explorer is a whole bunch of different versions of compilers fordifferent languages and different architectures that you can access with just abrowser tab.

It is an incredible tool for learning, teaching, triaging, optimising andmany more things. I will not go intodetail about it here, just a few things about TableGen’s inclusion.

The obvious thing is that llvm-tblgen does not emit instructions (though ahypothetical backend could) so there is no option to compile to binary orexecute code.

By default, records are printed as plain text. You can choose a backend by adding acompiler option, or by opening the “Overrides” menu and selecting an “Action”.

It is important to note that TableGen backends have very specific expectations ofwhat will be in the source code. As if you had a C++ compiler thatwould not compile for Arm unless it saw arm_is_cool somewhere in thesource code.

In the LLVM repository all the required classes are set up for you, but inCompiler Explorer they are not. So, if you would like to experiment with anexisting backend, I suggest you provide stub implementations of the classes, orcopy some from the LLVM project repository. You can also use standard includesfrom include/llvm/*.td.

It is not possible at this time to develop a backend within Compiler Explorer,but you can select the JSON backend and copy that JSON to give to local scripts.

Multi-file projects (â€œIDE modeâ€) also work as expected, so, if you would like,you can have your own include files.

Finally, remember that you can share Compiler Explorer examples. If you areasking or answering questions about TableGen, always include a Compiler Explorerlink if you can!

Jupyter Notebooks

Jupyter creates interactive notebooks. A notebook is asingle document which contains text, code and the results of running that code.This enables you to edit the code and rerun it to update the results in thenotebook.

This is great for taking notes or building up large examples from small chunksof code. You can export the document as a notebook that anyone can edit, orin noninteractive formats such as PDF or Markdown.

TableGen can be used in notebooks by using the TableGen Jupyter Kernel.Installation instructions are available here and you can watch me talk more about ithere.

Note: There is also anMLIR kernelfor Jupyter, along with many others.

We have aimed to give the same experience as other languages, so I will focusnot on how to use a notebook, but instead on what we have been able to make withthem.

TableGen Tutorial Notebook

This notebook is an introduction to TableGen. You can read it onGitHub,or downloadit and read it in Jupyter.

When using Jupyter, you can edit the document to add your own examples or expandthe ones that you find interesting.

“How to Write a TableGen Backend” Notebook

This notebook uses Python instead of TableGen, and it shows you how to write abackend.

The 2021 EU LLVM Developer’s Meeting talkâ€œHow to write a TableGen backendâ€by Min-Yih Hsu is the basis for this. Thenotebookis in fact a Python port of Min’s ownC++ implementation.

It shows you how to take the JSON output of llvm-tblgen and process it withPython to create SQL queries.

What is unique here is we now have the same content in multiple media forms andmultiple programming languages. Choose the ones that suit you best.

Referring back to â€œThere is TableGen and There are Things Built With TableGenâ€, the tutorial notebook is TableGen. The writing a backend notebook is â€œThingsBuilt With TableGenâ€.

Limitations

The major limitation of the notebooks is that we have no output filtering. Thismeans if you do include â€œllvm/Target/Target.td" you will get about 320,000lines of output (before you have added any of your own code). This is more thana default notebook accepts from a kernel and when I removed that limit, thebrowser tab crashed.

This is not a problem in most cases and the possible solutions have bigtrade-offs, so we are not going to rush a fix. If it does affect you, please add yourfeedback to thetracking issue.

TableGen Language Server

The MLIR project has implemented a server for theLanguage Server Protocol(LSP). Which supports TableGen and2 other languages used within MLIR.

The language server protocol provides information to compatible editors aboutthe structure of a language and project. For example, where are the includedfiles? Where is the definition of a particular type?

If you have used a LSP compatible editor (such as Visual Studio Code), you haveprobably used a language server without knowing. â€œGo To Definitionâ€ is themost common feature they provide.

The Language Server Protocol allows you to open a project, go to the code youwant to change and jump from there directly to the other relevant parts of therepository. With 500,000+ lines of TableGen in the LLVM project, that is a lot ofcode you get to ignore!

Setup

You will need a copy of the server binary tblgen-lsp-server. Which you can getfrom therelease package for yourplatform, or you can build it yourself.

This is how to build it yourself:

$ cmake -G Ninja <path-to>/llvm-project/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="mlir"$ ninja tblgen-lsp-server

Having run those commands, tblgen-lsp-server is found in <build-dir>/bin/.

The server reads a compilation database file tablegen_compile_commands.yml,which is made for you when you configure LLVM using CMake.

This serves a similar purpose to the compile_commands.json file generated when usingCMAKE_EXPORT_COMPILE_COMMANDS, but the two files are not related.

As long as your checkout of llvm-project includesthis committhe compilation database includes TableGen files from all enabledprojects (prior to that commit it was MLIR only).

For example this configure command includes information about TableGen files from theLLVM, Clang, MLIR and LLDB subprojects:

$ cmake -G Ninja <path-to>/llvm-project/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;llvm;lldb;mlir"

This also applies to -DLLVM_TARGETS_TO_BUILD=. Enabling only one target meansthat the compilation database only has files relevant to that target.

Note: You do not need to build a project to include its TableGen files inthe compilation database. Configuring is all that is needed.

Next, configure the LSP client for your editor.

If you are using Visual Studio Code, install the MLIRextension. Then follow the setup instructionshere to tellthe extension where the server and compilation database are.

If you are using a different editor, refer to its documentation to learn how toset up a language server. Setting the path to the compilation database may requirethe use of the server’s command line options. Run tblgen-lsp-server --help tosee all available options.

Example

This example assumes you have configured LLVM with the AArch64 target enabled.(It is enabled by default)

Open the file llvm/lib/Target/AArch64/AArch64.td.
Put your cursor on a use of the SubtargetFeature type.
In the menu bar, select “Go” then “Go to Definition”.
This takes you to llvm/include/llvm/Target/Target.td, whereSubtargetFeature is defined.

Limitations

The language server highlights an anti-pattern in the way some LLVM targetssuch as AArch64 use TableGen.

You may find yourself in a file that uses a class but does not define it orinclude any files which define it. This is because this file is intended to beincluded in another file, which does include a definition of that class.

example.td: class Example {}uses_example.td: def example: Example {}main.td: include "example.td" include "uses_example.td"

The example above shows this anti-pattern:

The file example.td defines the class Example.
uses_example.td uses the class Example, but does not include example.td.
main.td includes both example.td and uses_example.td.
main.td is the file that is compiled.
When you are in uses_example.td, the language server does not know whereExample is defined,
When you are in main.td, the language server does know where Example isdefined.

Perhaps we can address this by improving the language server, or reorganisingthe includes so we do not have files that appear to be isolated.

Dump

What about printf? The best debugging tool of them all.

TableGen’s equivalent isdump,and its companion repr.

def op;class A { string A = "some text"; dag X =(op op);}def a : A;dump "The Value of a is: \n" # !repr(a);

dump prints to stderr:

<source>:8:1: note: The Value of a is:a {// A string A = "some text"; dag X = (op op);}dump "The Value of a is: \n" # !repr(a);^

This was addedrecently.So you will need a recent build, or a released version 18.0 or newer (which is unreleasedat time of writing).

Of course you can try thison Compiler Explorer right now!

Assertions

An assertion checks that a condition is true at a specific point in yourprogram. An assertion consists of:

The keyword assert.
A condition (usually a call to one of thebang operators).
A message.

If the condition is false, a compiler error is generated with the message youprovided.

For example, the code below checks that you have not tried to make a registerwith a size that is less than 0.

class Register<int _size> { assert !gt(_size, 0), "Register size must be > 0, not " # _size # "." ; int size = _size;}def X0: Register<8> {}def X1: Register<-8> {}

(Try this on Compiler Explorer)

The register X0 has _size=8, so the condition !gt(_size, 0) (which wouldbe _size > 0 in C syntax) is true and therefore no error is generated.

The register X1 has _size=-8, so the condition is false and an error isgenerated. The compiler output is shown below:

<source>:2:11: error: assertion failed assert !gt(_size, 0), ^note: Register size must be > 0, not -8.

While learning new code it is helpful to add your own assertions to check yourassumptions. In addition, adding assertions to code written to be used by otherpeople is a good way to stop them using it incorrectly. Unlike documentation,you cannot miss an assertion error.

Find In Files

This is last because in an ideal world it would be the last option, but it isoften not the least of the options. Grep, ack, Find In Files, whatever you call it,searching text is unreasonably effective if you have a little knowledge of thelanguage syntax.

Why should I mention such an obvious idea? Well, obvious is subjective, andthere is a special situation that makes it more effective than usual.

In the LLVM project repository we have the vast majority of TableGen code in use today.Would you like to know how to use a particular feature? It is all there,somewhere in 500,000+ lines of source code. You would be surprised by what asimple query can find despite that.

Think about the thing you are trying to find. What do you think its sourcecode would look like? If it is a class would it have template arguments or notand so would there be a < after the name? If it is an error message, what partswould be constant and what parts would be inserted into a template message?

Expected end of line is likely to be a static string so you can search for themessage itself. In contrast, class Foo has no attribute Bar is more likely tobe created by substituting in the name of the class and attribute. So a goodsearch term for this would be has no attribute.

There are also tests for the compiler, most of which are inthis folder.This folder contains minimal examples for the language features. Try narrowingyour search to this location.

Conclusion

Learning TableGen does not have to be scary. Do not think that because it is anisolated DSL that it does not have what you have come to expect from yourfavourite languages.

Keep in mind that TableGen is also a tool, not a goal in itself. If you canachieve your goals with a limited but accurate understanding of TableGen and itsbackends, that is great. Learn as much as you want or need.

In addition to the tools, there is an active community ready to answer yourquestions on Discord or theforums.

If you find problems or want to contribute improvements please do so. Open aGitHub Issue orPull Request.

Look at the other languages you use. Do they have these tools? Should they? Theymight be the difference between frustration and your new favourite language.

Acknowledgements

Thank you to Andrzej WarzyÅ„ski, Francesco Petrogalli, Min-Yih Hsu and Sally Neale (Arm) for reviewing this article.

Tutorial Development with Clang-Repl

Thu, 05 Oct 2023 00:00:00 +0000

Introduction

I’m Krishna Narayanan, a final undergraduate at Veermata Jijabai TechnologicalInstitute, Mumbai and I am talking about my GSoC project in this blog postwhich mentions the goals, tasks we have accomplished during this summer tenure.My GSoC project aims to develop tutorials demonstrating the current capabilities ofClang-repl. The need for Clang-repl is that it presents opportunities for rigorousopen-source development. However, even though it is inspired by cling, not all ofClang-repl and Cling are the same, i.e., they are similar, but work needs to be done toadd Xeus protocol support for Clang-repl.In a similar convention, tutorials were demonstrated for CppInterOp (clang-based C++),Xeus-cpp (interactive programming environment for C++) and Xeus-clang-repl.

Contributions:

This is the list of contributions according to the pull requests I have sent:

1. Update Clang-Repl Documentation

This patch adds support for Clang-Repl documentation, basically, it provides alldetails regarding the usage of Clang-Repl. It emphasises the features of Clang-Replthat it offers to the user. The REPL nature enables users to prototype,experiment and gives a user-friendly experience. Similarly, the second patchadds Clang-Repl Execution Handling Results by Saqib in which we enabled thegraphviz extension for the llvm/clang documents to support the graphviz conventionfor pictorial representation (under review).

2. Add C++ InterOp Documentation Setup

CppInterOp is a clang-based C++ Interoperability library, which enables interoperabilitywith C++ code to more interactive languages like Python. The above patch addsdocumentation setup for CppInterOp consisting of both sphinx and doxygendocumentations. The above patches have covered all topics, points catering to thedevelopment and usage of CppInterOp which include building from source(installation), usage, FAQs, developerâ€™s documentation, tutorials and references. The tutorials inthe docs give a detailed understanding of C++ InterOp usage, which includes C-C++interoperability and C++-python interoperability.

3. Add xeus-clang-repl Documentation Setup

Xeus-clang-repl integrates clang-repl with the xeus protocol and is a platformfor C++ usage in Jupyter Notebooks. The above patch adds a documentation setupfor xeus-clang-repl consisting of both sphinx and doxygen documentations. Thedocumentation covers all information regarding installation, usage, importanceand references for xeus-clang-repl. It includes tutorials portraying the differentfeatures that can be used in xeus-clang-repl, especially the C++-python integrationexecuting simultaneously in the Jupyter cell with the help of magic commands( %%python).

4. Add xeus-cpp Documentation Setup

https://github.com/compiler-research/xeus-cpp/pull/13

Xeus-cpp is an interactive programming environment that allows you toexecute C++ code in a Jupyter Notebook. The above patch adds a documentation setupfor xeus-cpp consisting of both sphinx and doxygen documentations.The documentationcovers all information regarding installation, usage, importance and referencesfor xeus-cpp.It includes tutorials portraying the different features that can be used in xeus-cpp,especially the C++-python integration executing simultaneously in the Jupyter cellwith the help of magic commands(%%python).

5. Others

These are miscellaneous patches contributing to the development of currentdocumentation setup and content. It includes migration from the v1 to the v2 configurationfor readthedocs setup.

Acknowledgements

All the goals that were originally proposed have been completed to the best of myabilities (the xeus-cpp setup has not been merged yet). I will be working on improvingthese things after suggestions in the upcoming weeks to make tutorials and documentationmore understandable for users. I will keep working with the compiler research groupeven after the GSoC period, contributing to tutorials and landing other developmentpatches.

I am extremely grateful to my mentors, Vassil and David, for their constanthelp and support in the last two months. Many thanks to Vassil for providing helpand reviewing the code promptly and guiding me towards the final goal, specialthanks to Parth and Baidyanath for guiding me during the initial phase of GSoC.The journey has been quite full of learning, experiencing new techstack and realisingthe importance of documentation and tutorials for a better user experience. I thankVassil and all LLVM community members for giving me this great opportunity to workwith such an evergreen and interesting community.

Finally, thanking everyone at the compiler research group for assisting throughoutthe GSoC period with many new concepts and help I needed. I am also thankful toGoogle for providing me the opportunity to work on this project during the summer,which helped me learn a lot and will also surely help in my career in the future.

Diagnostic Improvements in Clang 17

Tue, 19 Sep 2023 00:00:00 +0000

Introduction

In the last few months, I have been a part of an ongoing effort to improve Clang’s diagnostic capabilities.The newly released Clang 17 brings several of these improvements to the forefront.This blog post aims to provide a comprehensive overview of these diagnostic enhancements.We will employ simplified code examples and compare diagnostic outputs from Clang 16 and Clang 17 to illustrate how the latest updates can enhance the development experience for Clang users.

Multi-line printing of code snippets

One of the most anticipated diagnostic features of Clang 17 is its support for multi-line printing of code snippets.This marks a departure from the old single-line limit, which used to make it difficult to fully understand the context around a code issue.This new feature improves the readability and comprehensibility of diagnostic messages by displaying a more complete view of the code in question.Moreover, line numbers are now attached to the left of each line, allowing for quicker navigation and issue resolution.

int func( int a, int b, int& r);void test(int *ptr) { func(3, 4, 5); func(3, 4);}

Before:

<source>:5:3: error: no matching function for call to 'func' func(3, 4, 5); ^~~~<source>:1:5: note: candidate function not viable: expects an lvalue for 3rd argumentint func( ^<source>:6:3: error: no matching function for call to 'func' func(3, 4); ^~~~<source>:1:5: note: candidate function not viable: requires 3 arguments, but 2 were providedint func( ^

After:

<source>:5:3: error: no matching function for call to 'func' 5 | func(3, 4, 5); | ^~~~<source>:1:5: note: candidate function not viable: expects an lvalue for 3rd argument 1 | int func( | ^ 2 | int a, int b, int& r); | ~~~~~~<source>:6:3: error: no matching function for call to 'func' 6 | func(3, 4); | ^~~~<source>:1:5: note: candidate function not viable: requires 3 arguments, but 2 were provided 1 | int func( | ^ 2 | int a, int b, int& r); | ~~~~~~~~~~~~~~~~~~~~

In this example, the newly covered source ranges make it easier to understand why the overload candidate is invalid.

Commit: https://reviews.llvm.org/D147875 (Timm BÃ¤der)

Clang warns on macro redefinitions. When the redefinition happens in assembly files, and the previous definition of the macro comes from the command line, the last definition is now diagnosed as coming from <command line> instead of <built-in>.

Assembly file:

#define MACRO 3

Clang invocation command:

clang -DMACRO=1 file.S

Before:

warning: 'MACRO' macro redefined [-Wmacro-redefined]#define MACRO 3 ^<built-in>:362:9: note: previous definition is here#define MACRO 1 ^

After:

warning: 'MACRO' macro redefined [-Wmacro-redefined] 1 | #define MACRO 3 | ^<command line>:1:9: note: previous definition is here 1 | #define MACRO 1 | ^

Commit: https://reviews.llvm.org/D145397 (John Brawn)

Clang 17 emits a warning on any language-defined builtin macro being undefined or redefined, some of which were just ignored in Clang 16.

#undef __cplusplus

Before: No Warning

After:

<source>:1:8: warning: undefining builtin macro [-Wbuiltin-macro-redefined] 1 | #undef __cplusplus | ^

Redefinition of compiler builtin macros usually leads to unintended results because library headers often rely on these macros, and they do notexpect these macros to be modified by users.

Commit: https://reviews.llvm.org/D144654 (John Brawn)

Clang 17 diagnoses unexpected tokens after a #pragma clang|GCC diagnostic push|pop directive.

#pragma clang diagnostic push ignore

Before: No Warning

After:

<source>:1:31: warning: unexpected token in pragma diagnostic [-Wunknown-pragmas] 1 | #pragma clang diagnostic push ignored | ^

Commit: https://github.com/llvm/llvm-project/commit/7ff507f1448bfdfcaa91d177d1f655dcb17557e7 (Aaron Ballman)

Clang 17 generates notes and fix-its for ifunc/alias attributes which point to unmangled function names.

__attribute__((used)) static void *resolve_foo() { return 0; }__attribute__((ifunc("resolve_foo"))) void foo();

Before:

<source>:3:16: error: ifunc must point to a defined function__attribute__((ifunc("resolve_foo"))) void foo(); ^

After:

<source>:3:16: error: ifunc must point to a defined function 3 | __attribute__((ifunc("resolve_foo"))) void foo(); | ^<source>:3:16: note: the function specified in an ifunc must refer to its mangled name<source>:3:16: note: function by that name is mangled as "_ZL11resolve_foov" 3 | __attribute__((ifunc("resolve_foo"))) void foo(); | ^~~~~~~~~~~~~~~~~~~~ | ifunc("_ZL11resolve_foov")

One needs to be aware of the C++ name mangling when using ifunc or alias attributes, but knowing the mangled name from a function signature isn’tan easy task for many people.This change makes the error message highly understandable by suggesting that the ifunc needs to refer to the mangled name,and it also makes this error more actionable by representing the mangled name.

Commit: https://reviews.llvm.org/D143803 (Dhruv Chawla)

Clang 17 avoids duplicate warnings on unreachable [[fallthrough]]; statements previously issued from -Wunreachable-code and -Wunreachable-code-fallthrough by prioritizing -Wunreachable-code-fallthrough.

void f(int n) { switch (n) { [[fallthrough]]; case 1:; }}

Clang invocation command:

clang++ -Wunreachable file.cpp

Before:

<source>:3:5: warning: code will never be executed [-Wunreachable-code] [[fallthrough]]; ^~~~~~~~~~~~~~~~<source>:3:5: warning: fallthrough annotation in unreachable code [-Wunreachable-code-fallthrough]

After:

<source>:3:5: warning: fallthrough annotation in unreachable code [-Wunreachable-code-fallthrough] 3 | [[fallthrough]]; | ^

Commit: https://reviews.llvm.org/D145842 (Takuya Shimizu)

Clang 17 correctly emits diagnostics for unavailable attributes that were ignored in Clang 16.

template <class _ValueType = int>class __attribute__((unavailable)) polymorphic_allocator {};void f() { polymorphic_allocator<void> a; }

Before:No diagnostics

After:

<source>:4:12: error: 'polymorphic_allocator<void>' is unavailable 4 | void f() { polymorphic_allocator<void> a; } | ^<source>:2:36: note: 'polymorphic_allocator<void>' has been explicitly marked unavailable here 2 | class __attribute__((unavailable)) polymorphic_allocator {}; | ^

Commit: https://reviews.llvm.org/D147495 (Shafik Yaghmour)

Clang no longer emits -Wunused-variable warnings for variables declared with __attribute__((cleanup(...))) to match GCC’s behavior.

void c(int *);void f(void) { int __attribute__((cleanup(c))) X1 = 4; }

Before:

<source>:2:48: warning: unused variable 'X1' [-Wunused-variable]void f(void) { int __attribute__((cleanup(c))) X1 = 4; } ^

After: No Warning

cleanup attribute is used to write RAII in C.Objects declared with this attribute are actually used as arguments to the function specified in cleanup attribute after its declaration,and thus, it’s considered better not to diagnose them as unused.

Commit: https://reviews.llvm.org/D152180 (Nathan Chancellor)

`alignas` specifier

Clang 16 modeled alignas(type-id) as alignas(alignof(type-id)).Clang 17 fixes this modeling and thus fixes the wrong mention of alignof in diagnostics about alignas and _Alignas.

struct alignas(void) A {};

Before:

<source>:1:16: error: invalid application of 'alignof' to an incomplete type 'void'struct alignas(void) A {}; ~^~~~~

After:

<source>:1:16: error: invalid application of 'alignas' to an incomplete type 'void' 1 | struct alignas(void) A {}; | ~^~~~~

Commit: https://reviews.llvm.org/D150528 (yronglin)

Shadowings

Clang 17 emits an error when lambda’s captured variable shadows a template parameter.

auto h = [y = 0]<typename y>(y) { return 0; }

Before: No Error

After:

<source>:1:11: error: declaration of 'y' shadows template parameter 1 | auto h = [y = 0]<typename y>(y) { return 0; }; | ^<source>:1:27: note: template parameter is declared here 1 | auto h = [y = 0]<typename y>(y) { return 0; }; | ^

Commit: https://reviews.llvm.org/D148712 (Mariya Podchishchaeva)

Clang 17’s -Wshadow diagnoses shadowings by static local variables.

int var;void f() { static int var = 42; }

Before: No Warning

After:

<source>:2:23: warning: declaration shadows a variable in the global namespace [-Wshadow] 2 | void f() { static int var = 42; } | ^<source>:1:5: note: previous declaration is here 1 | int var; | ^

Commit: https://reviews.llvm.org/D151214 (Takuya Shimizu)

`-Wformat`

Clang 17 diagnoses invalid use of scoped enumeration types in format strings, which is an Undefined Behavior.Now it also emits a fix-it hint to suggest the use of static_cast to its underlying type to avoid the UB.

#include <limits.h>#include <stdio.h>enum class Foo : long { Bar = LONG_MAX,};int main() { printf("%ld", Foo::Bar); }

Before: No Warning

After:

<source>:8:28: warning: format specifies type 'long' but the argument has type 'Foo' [-Wformat] 8 | int main() { printf("%ld", Foo::Bar); } | ~~~ ^~~~~~~~ | static_cast<long>( )

Commit: https://github.com/llvm/llvm-project/commit/3632e2f5179a420ea8ab84e6ca33747ff6130fa2 (Aaron Ballman)

Commit: https://reviews.llvm.org/D153622 (Alex Brachet)

Clang 17’s -Wformat recognizes %lb and %lB as format specifiers.

#include <cstdio>int main() { printf("%lb %lB", 10L, 10L); }

Before:

<source>:2:23: warning: length modifier 'l' results in undefined behavior or no effect with 'b' conversion specifier [-Wformat]int main() { printf("%lb %lB", 10L, 10L); } ~^~<source>:2:27: warning: length modifier 'l' results in undefined behavior or no effect with 'B' conversion specifier [-Wformat]int main() { printf("%lb %lB", 10L, 10L); } ~^~

After: No Warning

%b and %B are new formats for printing binary representations of integers specified in the ISO C23 draft.There are already several libc implementations available that support this format. (glibc >= 2.35, for example)

Clang 16 already recognizes %b and %llb as valid format specifiers but handles %lb as invalid.Clang 17 recognizes %lb and %lB to avoid false positive warnings and to emit correct fix-it hints.

Commit: https://reviews.llvm.org/D148779 (Fangrui Song)

Clang often prints the subexpression values of binary operators such as ==, ||, and && in static assertion failures to help usersunderstand the cause of the failure.Clang 17 stops printing subexpression values if the binary operator is || because it is evident that both subexpressions evaluate to false in that case.
The error message for the failure of static assertion now points to the asserted expression instead of the static_assert token.

constexpr bool a = false;constexpr bool b = false;static_assert(a || b);

Before:

<source>:3:1: error: static assertion failed due to requirement 'a || b'static_assert(a || b);^ ~~~~~~<source>:3:17: note: expression evaluates to 'false || false'static_assert(a || b); ~~^~~~

After:

<source>:3:15: error: static assertion failed due to requirement 'a || b' 3 | static_assert(a || b); | ^~~~~~

Commit: https://reviews.llvm.org/D147745 (Jorge Pinto Sousa)

Commit: https://reviews.llvm.org/D146376 (Krishna Narayanan)

Clang 17 diagnoses calls to a null function pointer in constexpr evaluation as such instead of just saying it is invalid.

constexpr int call(int (*F)()) { return F();}static_assert(call(nullptr));

Before:

<source>:4:15: error: static assertion expression is not an integral constant expressionstatic_assert(call(nullptr)); ^~~~~~~~~~~~~<source>:2:12: note: subexpression not valid in a constant expression return F(); ^<source>:4:15: note: in call to 'call(nullptr)'static_assert(call(nullptr)); ^

After:

<source>:4:15: error: static assertion expression is not an integral constant expression 4 | static_assert(call(nullptr)); | ^~~~~~~~~~~~~<source>:2:12: note: 'F' evaluates to a null function pointer 2 | return F(); | ^<source>:4:15: note: in call to 'call(nullptr)' 4 | static_assert(call(nullptr)); | ^~~~~~~~~~~~~

Commit: https://reviews.llvm.org/D145793 (Takuya Shimizu)

Member function calls are displayed more true to the user-written code.

struct Foo { constexpr int div(int i) const { return 1 / i; }};constexpr Foo obj;constexpr const Foo &ref = obj;static_assert(ref.div(0));

Before:

<source>:7:15: error: static assertion expression is not an integral constant expressionstatic_assert(ref.div(0)); ^~~~~~~~~~<source>:2:45: note: division by zero constexpr int div(int i) const { return 1 / i; } ^<source>:7:19: note: in call to '&obj->div(0)'static_assert(ref.div(0)); ^

After:

<source>:7:15: error: static assertion expression is not an integral constant expression 7 | static_assert(ref.div(0)); | ^~~~~~~~~~<source>:2:45: note: division by zero 2 | constexpr int div(int i) const { return 1 / i; } | ^ ~<source>:7:15: note: in call to 'ref.div(0)' 7 | static_assert(ref.div(0)); | ^~~~~~~~~~

Commit: https://reviews.llvm.org/D151720 (Takuya Shimizu)

When a constexpr variable’s constructor call leaves its subobject uninitialized, Clang 17 prints the uninitialized subobject’s name instead of its type.

struct Foo { constexpr Foo() {} int val;};constexpr Foo ff;

Before:

<source>:5:15: error: constexpr variable 'ff' must be initialized by a constant expressionconstexpr Foo ff; ^~<source>:5:15: note: subobject of type 'int' is not initialized<source>:3:7: note: subobject declared here int val; ^

After:

<source>:5:15: error: constexpr variable 'ff' must be initialized by a constant expression 5 | constexpr Foo ff; | ^~<source>:5:15: note: subobject 'val' is not initialized<source>:3:7: note: subobject declared here 3 | int val; | ^

Commit: https://reviews.llvm.org/D146358 (Takuya Shimizu)

Clang 17 diagnoses unused const variable template as “unused variable template” instead of “unused variable”.

namespace {template <typename T> constexpr double var_t = 0;}

Before:

<source>:2:40: warning: unused variable 'var_t' [-Wunused-const-variable]template <typename T> constexpr double var_t = 0; ^

After:

<source>:2:40: warning: unused variable template 'var_t' [-Wunused-template] 2 | template <typename T> constexpr double var_t = 0; | ^~~~~

Uninstantiated templates do not generate symbols, and thus, the meaning of unused is broader than the usualunused variables or functions.

For this reason, -Wunused omits -Wunused-template.This change follows the rationale and leads to fewer unwanted -Wunused-const-variable warnings.

Commit: https://reviews.llvm.org/D152796 (Takuya Shimizu)

Acknowledgements

Special thanks are in order for Timm BÃ¤der, my Google Summer of Code mentor, for his invaluable guidance and support throughout the project.

Further gratitude is extended to my regular reviewers: Aaron Ballman, Christopher Di Bella, and Shafik Yaghmour, for their insightful and constructive feedbackthat greatly improved my codes.

Map LLVM Values to corresponding source level expression, GSoC'23 Project

Tue, 19 Sep 2023 00:00:00 +0000

Hi, My name is Shivam, I involved with the LLVM Foundation in 2023 GSoC edition and worked on an interesting project Map LLVM Values to corresponding source level expression.

Project Scope

Programmers frequently rely on compiler-generated remarks and analysis reports to enhance the efficiency of their code. While compilers excel at including source code positions (such as line and column numbers) in these generated messages, it would be advantageous if these reports also contained the corresponding source-level expressions. The LLVM implementation presently employs a limited set of intrinsic functions to establish a connection between LLVM program elements and source-level expressions. This project’s objective is to leverage the data embedded in these intrinsic functions to either generate source expressions that correspond to LLVM values. The optimization of memory accesses within a program is crucial for achieving optimal application performance. Specifically, our goal is to utilize compiler analysis messages that detail source-level memory accesses associated with LLVM load/store pointer values, which can impede compiler optimizations. As an illustration, this information can be used to identify memory access dependencies that hinder vectorization.

Expected result was to provide an interface which takes an LLVM value at any point in the LLVM transformations pipeline and returns a string corresponding to the equivalent source-level expression. We are especially interested in using this interface to map addresses used in load/store instructions to equivalent source-level memory references.

What we did

The core achievement of the project is the development of an analysis pass that operates on LLVM intermediate representation (IR). This analysis pass identifies load and store instructions, and then conducts a recursive traversal to construct source expressions that represent equivalent source-level memory references. This is achieved by utilizing the metadata and debug intrinsics available in the LLVM IR. This pass was integrated into the loop vectorizer framework, which is a significant step towards practical application. Accompanying the implementation, a comprehensive suite of tests was developed to ensure the accuracy and expected behavior of the analysis pass.Analysis pass exist at llvm/lib/Analysis/SourceExpressionAnalysis.cpp

Implementation Overview

Debug Metadata Handling:

The implementation effectively processes debug metadata associated with instructions. It leverages debug value and declare instructions to retrieve variable names, which are then used to construct source expressions. This enables accurate mapping of LLVM values to their corresponding source-level expressions.

Instruction Types Handling:

The implementation covers a range of instruction types, including binary operators, GetElementPtr, sign extension instructions, LoadInst, and StoreInst. This comprehensive coverage ensures that a wide array of LLVM instructions can be translated into meaningful source-level expressions.

Type and Tag Handling:

The implementation utilizes type information from DIType to determine the type tag, which aids in constructing accurate source-level expressions. Different types are handled appropriately, enhancing the fidelity of the generated expressions.

Expression Construction:

The implementation constructs source-level expressions using the provided LLVM instructions. It combines operand names, operator symbols, and other relevant components to create expressions that closely resemble the original source code.

LoadInst and StoreInst Processing:

The implementation effectively processes LoadInst and StoreInst instructions. It generates source expressions for the loaded and stored values, considering both instruction operands and their associated debug metadata.

Mapping Storage:

The SourceExpressionsMap efficiently stores generated source expressions for various LLVM values. This storage mechanism helps in avoiding redundant calculations and ensures consistent results throughout the analysis.

However, Itâ€™s important to note that the generated source expressions are in C/C++ Style. Accounting for different source languages and their peculiarities has been beyond the scope of this initial attempt.In addition to developing a separate analysis pass for translating LLVM values into source-level expressions, the implementation was further enhanced by integrating this pass with the Loop Vectorizer. This integration allows for the reporting of source expressions for dependence source and destination pointers in the context of the loop vectorization process. This feature provides valuable insights to developers, aiding in their understanding of memory access patterns and facilitating optimizations.

The Current State

The project has successfully delivered the core functionality of generating source expressions for load and store instructions, covering array and pointer memory references. While initial attempts were made to handle complex structures like structs, this aspect is currently outside the project’s scope.

Structs pose a unique challenge due to their intricate representation within the LLVM Intermediate Representation (IR). While the project did make initial attempts to incorporate basic support for handling structs, the complexity of nested structures presented significant difficulties. As a result, we encountered obstacles in accurately extracting source expressions for structs and their complex compositions.

The code still didnâ€™t get merge, we still need review on the patch from other community members, the pull request is trackable on Github now Map LLVM Values to source level expression

Letâ€™s look at how the analysis pass able to provide useful source level expression for the memory dependencies in the loop.

//test.cvoid test_backward_dep(int n, int *A) { for (int i = 1; i <= n-3; i += 3) { A[i] = A[i-1]; A[i+1] = A[i+3]; }}

Generate LLVM file (*.ll) using

clang -O3 -S -g emitllvm test.c

(Assuming test.ll file gets generated)

Using below clang command to compile and emit remarks related to loop vectorization along with source expression

opt -report-source-expr=true -passes='function(loop-vectorize,require<access-info>)' -disable-output -pass-remarks-analysis=loop-vectorize test.ll

Note : A command-line option, ReportSourceExpr, was introduced to control the reporting of source expressions. This option allows users to toggle the reporting of source expressions for Load/Store pointers. By setting this option to true (-report-source-expr=true), developers can receive additional information about the source expressions associated with dependence source and destination pointers, enhancing the quality and depth of the optimization reports.

Output remarks

remark: test.c:3:12: loop not vectorized: unsafe dependent memory operations in the loop. Use #pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operation into a separate loop Dependence source: &A[(i + -3)] Dependence destination: &A[(i + 1)]

This output includes the information about the loop that wasnâ€™t vectorized due to unsafe dependent memory operations. And the interesting part for us is the Dependence Source and Destination source expressions.

This quick demonstration shows how the analysis can be integrated into the compilation process to provide valuable insights into the memory access patterns and their implications for Loop Vectorization.

Challenges and learnings

One of the challenges faced during the project was integrating support for complex structures like structs. These structures require specialized handling due to their intricacies in the LLVM IR. However, this aspect revealed the depth of understanding needed for successful interaction with LLVM’s IR and debug metadata. The project was an interesting journey, allowing for deep exploration of LLVM IR and a practical understanding of optimization remarks and metadata. Additionally, working with the loop vectorizer provided insights into its functionality and integration with custom analyses.Overall, the project served as a stepping stone for me to becoming an active contributor to the LLVM community. It provided invaluable learning opportunities and practical insights into compiler optimizations and LLVM’s architecture.

Future Work

â€¢ Handling Structs and Complex Data Typesâ€¢ Support for Other LLVM Instructionsâ€¢ Accurately build the source expressionÂ  when the Optimizations alters the source level data in the IR rigorously.Â â€¢ Possible integration with LLVM debugger.â€¢ Support multiple source languages, we would need to define mappings from LLVM constructs to constructs in each target language.

Final Words

I really wanted to thanks LLVM Foundation and my mentors Karthik Senthil and Satish Guggila for guiding over the project. It was amazing experience for me to working on this project. I am hoping that I’ll keep myself active in LLVM and Compilers. More details of this project can be found in this final report. Feel free to reach out to me at [email protected]">[email protected] for discussing this patch or anything else.

Adding a new target/object backend to LLVM JITLink

Tue, 28 Mar 2023 00:00:00 +0000

Motivation

For the last year, I have been contributing to LLVM JITLink. This post aims todoubly serve as a summary of my work and documentation for future contributors looking to add a new target/objectbackend to LLVM JITLink.

We will start by establishing some background and definitions of relevant concepts. Then, we will talk about whatthe project actually entailed. Finally, we will go over the execution details of the project.

The end goal of the project was to make LLVM JITLink capable of linking a 32-bit ELF object file, with i386specific relocations, into a 32-bit process on the i386 hardware architecture.
If the goal of the project already makes sense to you and you are looking to get started with adding a newtarget/object backend to LLVM JITLink yourself, you can skip to the â€œRecap and conveniencesâ€section.

Background

Linking

Our code often relies on external dependencies. For example, even a simple hello-world program written in C dependson the C stdlib for the printf function. These external dependencies are expressed as symbolic references, which Iwill henceforth refer to as just symbols. Symbols are names of data or functions that have unknown addresses andare resolved or fixed up during the linking process.

In chronological order -

The compiler converts source code to machine code.
The assembler converts machine code to object files (ELF, MachO, COFF etc.)
The linker links one or more object files (fixing up symbolic references along the way) and produces anexecutable or a shared library (also called shared object or dylib).

For the purposes of this discussion we will focus on executables, but the points that will be made hold for sharedobjects as well.

JIT linking

Unlike static linking, JIT (Just-in-time) linking is performed at runtime. While a static linker producesexecutables that are stored on disk, a JIT linker produces an in-memory image of the executable – essentiallyready to execute bytes in memory. JIT linking a C program may feel very much like running a shell script. Underthe hood though, the C program is linked into the memory of the invoking process, also commonly referredto as the executor process. The JIT linker patches up the executor process’ memory to account for the addresses ofsymbols at runtime, and executes necessary initializers.

If you are familiar with dynamic loading then JIT linking may sound familiar, and the two have a lot in common,however they are not the same. JIT linking operates on relocatable objects (vs shared objects/dylibs for dynamicloading), and performs both the static linkerâ€™s and the dynamic loaderâ€™s jobs. Doing so allows the JITlinker to dead-strip redundant symbols, which dynamic loading cannot do, and this allows JIT linking to supportfiner grained compilation of languages that tend to produce a lot of redundant symbol definitions(e.g. C++).

Need for JIT linking

JIT linking is primarily useful in the context of pre-compiled languages, such as C, C++, Rust etc. Why? At run time,these languages have no way¹ to bring new symbol definitions into a running processâ€™ memory and resolvereferences to them. Although dynamic loading partially solves this issue, it has its drawbacks (discussed above)and lags far behind the static linking experience.

With JIT linking, at run time, symbolic references can be resolved to existing symbols (from the newly JITâ€™d code), orto newly JIT’d symbols (from the pre-compiled code). The below toy example shows what this looks like in code.

// Let's assume we have the following, rather contrived,// C++ program that wants to add 2 numbers, but wants to use// an `add` function from a relocatable object file supplied by// the user.//// Let's also assume that the add function in the user-supplied// relocatable object will reference a symbol named `MAGIC` in its// definition.const int MAGIC = 42;int main(int argc, char* argv[]) { int a = 1; int b = 2;  // Read the path of the user supplied relocatable object. string userSuppliedObjectPath = ...;  // Initialize your JIT class that uses JIT linking under the hood. JIT J;  // Add the relocatable object to your JIT. J.addObject(userSuppliedObjectPath);  // Lookup the `add` function in the newly added JIT object. // Once all symbolic references within the user supplied object // are resolved, the content is fixed up and emitted to memory. // And we can then get a pointer to the `add` function. auto *add = (int(*)(int, int))J.lookup("add").getAddress();  // At this point the symbolic reference to `MAGIC` in add's // definition must have been resolved to the memory address // of the constant `MAGIC` that we defined in this program. // Run the add function found in the JIT module. int result = add(a, b);}

That said, JIT linking by itself is not something that is very useful for an end user. JIT linking is an enablerfor certain use-cases with pre-compiled languages (some use-cases exist for JIT-compiled languages as well²).

JIT compilers (think of something like the JIT compilation component of the Java Hotspot VM, but for astatically compiled language)
Debugger expression evaluators (such as the LLDB expression evaluator)
REPLs (such as Cling and the currently experimental³ Clang-REPL)
Standalone scripts (such as the Swift scripts, where the JIT linking isused to add an immediate mode tothe compiler, which runs your code in-place via a JIT, rather than compiling it)
Scriptable extensions (think about running JITâ€™d code in the context of some existing app, allowing the appto be extended by JITâ€™d code rather than precompiled plugins)

While the above use cases may seem different, they are really the same â€” JIT linking enables linking code intoexisting processes (that may or may not already contain state/context), in an ABI-compatible way.

LLVM JITLink

LLVM JITLink is a JIT linking implementation, in the form of a low-level library within the LLVM infrastructure.It powers LLVM’s ORC JIT APIs, which is what end-users would usually use forbuilding runtime linking environments. It provides primitives for:

Re-using existing compilers to generate relocatable objects at runtime.
Allocating memory within a target executor process.
Linking code into a target executor process in an ABI-compatible way.

In simple words, a program Y, running in a process X, can hand JITLink a relocatable object file and JITLink willlink the object fileâ€™s code into Xâ€™s memory and run it under Xâ€™s existing context (globals, functions etc.), as ifit were part of a dynamic library loaded into process X⁴.

The project

Having set up all that background, letâ€™s understand the main task and the end goal of the project.

The task - Adding the i386(target)/ELF(object)backend to JITLink

What is a target?
1. Target here, refers to a hardware architecture. i386 is a 32 bit x86 architecture.
What is an object?
1. Object here, refers to an object file format. ELFis the object format commonly used on Linux systems.
Why do different target/object combinations matter and need additional work?
1. Different target/object combinations matter, because each combination may use distinct methods for connectingsymbolic references to symbol definitions. These methods are commonly referred to asrelocations.

The end goal

The end goal of the project was to make LLVM JITLink capable of linking a 32-bit ELF object file, with i386 specificrelocations, into a 32-bit process on the i386 hardware architecture.

Execution

Understanding high level constructs

LinkGraph

The LLVM JITLink documentation has an excellent description of LinkGraph.I recommend reading it after the below, high-level description of LinkGraph.

LinkGraph is an internal representation of an object file within LLVM JITLink. While object formats may havedifferent schemas and terminology for similar concepts, they all aim to represent machine code that can be relocatedin virtual memory. The purpose of a LinkGraph is to provide a generic representation of these concepts and nuancesacross different object file formats.

To draw conceptual analogies between the LinkGraph and an object format, let’s use ELF as an example. An ELF objectcontains:

Sections - Any chunk of bytes that must be moved into memory as a unit.
Symbols - A named chunk of bytes that could represent either data or executable instructions. Symbols occur aschildren of sections.
Relocations - A description of how to fix up bytes within a section once the address of the relocation’starget symbol is resolved.

A LinkGraph is capable of representing all of the above concepts. It first defines some building blocks.

Addressable - Anything that can be assigned an address in the executor processâ€™ virtual address space.
Block - A chunk of bytes that is addressable and occurs as part of a section.

On top of these building blocks, it defines the higher level object format concepts.

Symbol - Equivalent of a symbol in the ELF format. Represented using an offset from the base (address) of aBlock and a size in bytes.
Section - Equivalent of a section in the ELF format. Represented using a collection of symbols and blocks.
Edge - Equivalent of a relocation in the ELF format. Represented using an offset from the start of thecontaining block (indicating the storage location that needs to be fixed up), a pointer to the target whose address needs to be used for the fix-up and a kind to specify the patching formula.

JITLinkContext

JITLinkContext represents the target process that you’re linking into, and it provides the JIT linker with theability to ask questions about and take actions within the process. This includes the ability to look up symbols andallocate memory, in the target process, as well as to publish the results of the linking process to the broaderenvironment. Specifically, the JITLinkContext informs others of the addresses it has assigned to symbols and when thosesymbols become available in memory.

Understanding the JIT linking algorithm

The LLVM JITLink linking algorithm happens in multiple phases, with each phase consisting of passes over theLinkGraph and a call to the next phase at the end. In each phase the algorithm modifies the LinkGraph as needed, bythe end, producing a ready to execute in-memory image of the relocatable object that we started out with.

Something that did not click for me initially, but simplified things significantly once it did, was the fact thatthe LinkGraph was just that, a graph! Re-reading LLVM JITLinkâ€™s high-level description of the generic JIT linkingalgorithm with this simple view of the LinkGraph madeit much easier and intuitive to make sense of what was going on in the JIT linking process.

The algorithm also provides, implementers and users of JITLink, hooks to tap into the linking process. These hooks canbe used to achieve a number of things, including but not limited to, link-time optimizations, testing, validation etc.

The tangibles

First, I set up a test loop to validate whether LLVM JITLink is able to link 32-bit i386 ELF objects, containingvalid i386/ELF relocations, into a 32 bit process. The existing llvm-jitlink tool, which is built and put into the binfolder by default when you build the LLVM project, came in handy. llvm-jitlink is a command line wrapper for theJITLink library. It takes relocatable objects as input and links them into the executor process using JITLink.

The tricky part here, at least for me, was to get a 32-bit llvm-jitlink ELF executable. By default, Clang producesexecutables for the host architecture because of which I had to understand cross-compilation⁵ (compiling for atarget different from the host architecture) since I was developing on x86-64 hardware. In order to obtain a 32-bitllvm-jitlink ELF executable, on an x86-64 system, I needed the following -

Cross-compiler - A cross-compiler that could generate 32 bit x86 code. Clang generates 32-bit x86 code if thefollowing flags are specified in the build configuration:
1. CMAKE_CXX_FLAGS="-m32" or CMAKE_C_FLAGS="-m32" - instructs Clang to generate 32-bit code instead of thedefault 64-bit code.
2. LLVM_DEFAULT_TARGET_TRIPLE=X86 - instructs Clang to generate machine code for the x86 target by default.
Target shared libraries - 32 bit x86 shared libraries, that might be checked against during compilation. In mycase installing libstdc++.i686 and glibc-devel.i686 sufficed since that is all I needed to generate programscontaining all possible i386/ELF relocations.

The full command that I used to generate my build configuration was -

cmake -DCMAKE_CXX_FLAGS="-m32" -DCMAKE_C_FLAGS="-m32" \ -DCMAKE_CXX_COMPILER=<PATH_PREFIX>/bin/clang++ \-DCMAKE_BUILD_TYPE=Debug \ // It is important that the `llvm-tblgen`executable is for the host architecture-DLLVM_TABLEGEN=<LLVM_BUILD_DIR_FOR_HOST_ARCH>/bin/llvm-tblgen \-DLLVM_DEFAULT_TARGET_TRIPLE=i386-unknown-linux-gnu \// Set of targets that the compiler must be able to generate code for. // Can save compilation time by omitting redundant target backends.-DLLVM_TARGETS_TO_BUILD=X86 \-G "Ninja" ../llvm

The last piece of my test loop was the plumbing in LLVM JITLink, on top of which I could start adding i386/ELFrelocations. I added this plumbing as part of my first commit to LLVM JITLink.At a high level, there were 2 things that I implemented in that commit -

ELFLinkGraphBuilder_i386 - contained specialized logic for parsing i386/ELF relocations from an object file.
ELFJITLinker_i386 - contained specialized logic for fixing up i386/ELF relocations in the executable imagesupposed to be emitted to memory.

Having set up a test loop, I incrementally added support for the following i386/ELF relocations to LLVM JITLink.

Quick aside, before we talk about the individual relocations! Letâ€™s recall what relocations are.
The compiler generates code which contains symbolic references to actual symbols (everything other than localvariables in a function and functions themselves). The compiler just refers to symbols by the names used by theprogrammer and leaves a set of TODOs for the linker to complete during linking.
In ELF objects, these TODOs are found in the relocationsection. They tell the linker where and how a symbolic reference needs to be fixed. The linker then, for the mostpart, follows the compilerâ€™s instructions and resolves all the relocations in the program. The linker can resolverelocations because it has a view of the entire compiled program.

R_386_32

What - Tells the linker to replace the symbolic reference with the symbolâ€™s absolute memory address.
When - Used to reference global and static variables in non position-independent code (PIC). PIC allows codeto be loaded at any address in memory, rather than at a fixed address.

Code -

// Compile with => clang -m32 -c -o obj.o obj.c// declare a global variable xint x;int main() { // Compiler should generate a R_386_32 relocation here.  x += 1; return 0;}

00000000 <main>:0: 55 push %ebp1: 89 e5 mov %esp,%ebp3: 50 push %eax4: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)// Compiler wants to move the value of x into// the eax register but doesn't know the address// of x. So it leaves a TODO for the linker and// temporarily uses 0 as x's address.b: a1 00 00 00 00 mov 0x0,%eax c: R_386_32 x10: 83 c0 01 add $0x1,%eax// Same thing here13: a3 00 00 00 00 mov %eax,0x0 14: R_386_32 x18: 31 c0 xor %eax,%eax1a: 83 c4 04 add $0x4,%esp1d: 5d pop %ebp1e: c3 ret

R_386_PC32

What - Tells the linker to resolve the symbolic reference using the symbolâ€™s relative offset to the currentprogram counter (PC). The linker finds the offset of the referenced symbol, relative to the PC and hard-codesit in the corresponding assembly instruction. At run time, the processor looks at the call instructionâ€™sencoding and knows that the operand to the instruction represents the symbolâ€™s offset to the PC.
When - Used to call functions in PIC.

Code -

// Compile with => clang -m32 -ffunction-sections -c -o obj.o obj.c // declare a global function x void x {} int main() { // Compiler should generate a R_386_PC32 relocation here.  x(); return 0; }

00000000 <x>:0: 55 push %ebp1: 89 e5 mov %esp,%ebp3: 5d pop %ebp4: c3 ret00000000 <main>:0: 55 push %ebp1: 89 e5 mov %esp,%ebp3: 83 ec 08 sub $0x8,%esp6: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)// Compiler wants to call function x// but doesn't know its address. So it leaves// a TODO for the linker and temporarily uses garbage// bytes as x's address.//// The linker will replace the garbage bytes 0xfffffffc// with `offset => PC - address of x`.// `e8` here tells the i386 processor that the operand// is a PC relative offset and that the address of x needs// to be computed using `PC + offset`d: e8 fc ff ff ff call e <main+0xe> e: R_386_PC32 x12: 31 c0 xor %eax,%eax14: 83 c4 08 add $0x8,%esp17: 5d pop %ebp18: c3 ret

Another short detour to talk about dynamic linking because the remaining relocations are what enable dynamic linking.
In static linking, if your program accesses even a single symbol from a given library, then that entire library islinked with your program, which among other issues, increases the size of the generated executable. For instance, letâ€™stalk about that simple C program that just prints hello world again. With static linking, the executable thatâ€™sgenerated from your program is going to pull in the entire C standard library, because your program accessed theprintf function.
In dynamic linking, referenced libraries are accessed at build time but they are not brought into the linkedexecutable. Instead, the referenced global variables from these libraries are linked at load time (when the program isloaded into memory, to be run) and referenced functions from these libraries are linked at invocation time.
Thereâ€™s pros and cons to both approaches, whose details I will not go into, but will cursorily mention below.
With static linking the only thing the user of your executable needs is the executable itself. They wonâ€™t runinto issues of missing libraries.
With dynamic linking you donâ€™t need to update your executable, if the shared library is updated. This isespecially useful if you are distributing your executable.
Dynamic linking is just harder to implement than static linking.

If youâ€™re not already familiar with the concepts of GOT and PLT, I also recommend you take yet another quick detourfor some visual explanations!

R_386_GOTPC -

What - Tells the linker to replace the symbolic reference with the delta between the storage location, wherethe relocation has to be applied (or the fixup location) and the address of the GLOBAL_OFFSET_TABLE (GOT) symbol.
When - This relocation isnâ€™t used in isolation. Rather it is an enabler for R_386_GOTOFF, R_386_GOT32 andR_386_PLT32, which need to use the memory address of the GOT.

Code -

// Compile with => clang -m32 -fPIC -c -o obj.o obj.c// Declare a global staticstatic int a = 42;int main() { // Since we passed the `PIC` flag to Clang to // indicate that we want position independent code // Clang will generate code to access `a` using // the GOT. return a;}

00000000 <main>:0: 55 push %ebp1: 89 e5 mov %esp,%ebp3: 50 push %eax// This `call` instr is just telling the processor to// push the next instr's address on the stack and jump to// address 9. But 9 is the address of the next line. That's// weird...4: e8 00 00 00 00 call 9 <main+0x9>// And now that we did jump to 9, all we did was pop // the value that was on the stack and store it in ebx.// Wasn't the value on the stack just 9's address? Even // weirder...9: 58 pop %ebx// Wait a minute. The compiler left a TODO here for the// linker, to find the delta between the fixup location // and the address of the GOT. // // Ok, so if the address of the GOT was let's say 20, // then the linker will try to hardcode the value // `0x20-0xc => 0x14` and add it to the value in eax (0x9),// which will give us `0x14 + 0x9 => 0x1d`. // // Ah, that's not the address of the GOT. Yes, but // `0x1d + 0x3 => 0x20` is. Well, where is the 3 coming from?// The compiler helps us here, a bit. The address in eax isn't// the address of the fixup location it's off by 0x3. So along// with leaving us a TODO, the compiler also leaves us a reminder// to add 0x3 to our delta calculation, in order to arrive at // the correct address of the GOT.a: 81 c0 03 00 00 00 add $0x3,%ebx c: R_386_GOTPC _GLOBAL_OFFSET_TABLE_// Not super important what happens after the R_386_GOTPC// relocation is resolved for now...

R_386_GOTOFF -

What - Tells the linker to resolve the symbolic reference with the offset between the symbolâ€™s address andthe address of the GOTâ€™s base (computed and stored in a register when the R_386_GOTPC relocation is handled).
When - Used by shared libraries and executables to access internal symbols in a position independent way.

Code -

// Compile with => clang -m32 -fPIC -c -o obj.o obj.c// Declare a global staticstatic int a = 42;int main() { // Since we passed the `PIC` flag to Clang to // indicate that we want position independent code // Clang will generate code to access `a` using // the GOT. return a;}

00000000 <main>:0: 55 push %ebp1: 89 e5 mov %esp,%ebp3: 50 push %eax4: e8 00 00 00 00 call 9 <main+0x9>9: 58 pop %eax// We saw above how the R_386_GOTPC relocation gets resolved// and that the ebx register contains the address of the // GOT after the relocation is resolved.a: 81 c0 03 00 00 00 add $0x3,%ebx c: R_386_GOTPC _GLOBAL_OFFSET_TABLE_10: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)// Compiler wants to access `a`, but since we told it// to generate position-independent code, it generates access// to `a` using the GOT and leaves a TODO for the linker to find// the offset of `a` from the base of the GOT.//// The linker already knows the address of the base of the GOT// at this point - it's stored in ebx. It computes the address of// `a` and fixes up the 4 bytes after `0x8b 0x80`, to store the// offset between `a` and the GOT's base.17: 8b 80 00 00 00 00 mov 0x0(%ebx),%eax 19: R_386_GOTOFF a1d: 83 c4 04 add $0x4,%esp20: 5d pop %ebp21: c3 ret

R_386_GOT32

What - Tells the linker to resolve the symbolic reference with the offset between the address of the GOTâ€™sbase and the symbolâ€™s entry in the GOT (essentially computing an index into the GOT).
When - Used by shared libraries and executable to access external data symbols in a position independent way.

Code -

// Compile with => clang -m32 -fPIC -c -o obj.o obj.c// Declaring that `a` is defined externally.extern int a;int main() { // Since we passed the `PIC` flag to Clang to // indicate that we want position independent code // Clang will generate code to access `a` using // the GOT. return a;}

00000000 <main>:0: 55 push %ebp1: 89 e5 mov %esp,%ebp3: 50 push %eax4: e8 00 00 00 00 call 9 <main+0x9>9: 59 pop %ecx// We saw above how the R_386_GOTPC relocation gets resolved// and that the ebx register contains the address of the// GOT after the relocation is resolved.a: 81 c1 03 00 00 00 add $0x3,%ebx c: R_386_GOTPC _GLOBAL_OFFSET_TABLE_10: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)// Compiler wants to access `a`, but since we told it to// generate position-independent code, it generates access to// `a` using the GOT and leaves a TODO for the linker to find// the offset of `a`'s GOT entry from the base of the GOT.//// `a` got a GOT entry because we did not define it internally// and the compiler thinks that it will either come from another// source file or a shared library.//// The linker already knows the address of the base of the GOT// at this point - it's stored in ebx. It finds the address of// `a`'s GOT entry and fixes up the 4 bytes after `0x8b 0x81`,// to store the offset between `a`'s GOT entry and the GOT's base.17: 8b 81 00 00 00 00 mov 0x0(%ebx),%eax 19: R_386_GOT32 a// eax, at this point contains `a`'s address, which is dereferenced// in this mov instruction and stored into eax itself.1d: 8b 00 mov (%eax),%eax 1f: 83 c4 04 add $0x4,%esp22: 5d pop %ebp23: c3 ret

R_386_PLT32

What - Tells the linker to resolve the symbolic reference with the symbolâ€™s PLT entry.
When - Used by shared libraries and executables to access external function symbols in aposition-independent way.
Code -

// Compile with => clang -m32 -fPIC -c -o obj.o obj.c// Declaring that `foo` is a function defined externally.extern int foo(void);int main(void) { // Since we passed the `PIC` flag to Clang to // indicate that we want position independent code // Clang will generate code to access `foo` using // the PLT. return foo();}

 00000000 <main>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 50 push %eax 5: e8 00 00 00 00 call a <main+0xa> a: 5b pop %ebx // We saw above how the R_386_GOTPC relocation gets resolved // and that the ebx register contains the address of the  // GOT after the relocation is resolved. b: 81 c3 03 00 00 00 add $0x3,%ebx d: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ 11: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%ebp) // Compiler wants to access `foo`, but since we told it to // generate position-independent code, it generates access to // `foo` using its PLT entry and leaves a TODO for the linker to // find `foo`'s PLT entry address. // // The PLT machinery was explained here! 18: e8 fc ff ff ff call 19 <main+0x19> 19: R_386_PLT32 foo 1d: 83 c4 04 add $0x4,%esp 20: 5b pop %ebx 21: 5d pop %ebp 22: c3 ret

Testing

While I did talk about setting up a â€œtest loopâ€ earlier, here I want to briefly touch upon the topic of regressiontests – not so much upon the â€œwhyâ€ and the â€œwhatâ€, but the â€œhowâ€. Thereâ€™s some excellent testing utilities alreadyavailable in the LLVM project, but I found related documentation to be lagging. Specifically, I want to focus on theutilities that one might interact with for writing a regression test for one of LLVM JITLinkâ€™s target-object backend.

Before we go ahead, I want to mention this high level testing guidefor LLVM. The guide should get you to the point where you know where/how to create a test file, how to make your testsdiscoverable by the test runner (LLVM Integration Tester - lit) and how to run the tests using the test runner.

That said, letâ€™s use the sample test file below, to talk about the utilities that you might use for writing a regressiontest for one of LLVM JITLinkâ€™s target-object backend.

// Regression test files are assembly files (".s" extension).// The files must begin with what are known as "RUN" lines.// Each "RUNâ€ line tells lit how to run the test file.// RUN lines look and feel like you were running shell commands.// Each regression test will likely begin with the following// two RUN lines, although the exact RUN command may need to be// modified, based on the test cases need.# RUN: llvm-mc -triple=i386-unknown-linux-gnu -position-independent -filetype=obj -o %t.o %s// Notice how llvm-jitlink is run with the "-noexec" option.// The option tells llvm-jitlink to not run the code loaded// to memory. This is important because JITLink may be linking// and loading code for an architecture different from the one// where the regression test is running in LLVM's build/release// pipeline.# RUN: llvm-jitlink -noexec %t.o// llvm-jitlink also requires each file to have a "main" function.// Your test code can go here, but it doesn't have to..text.globl main.p2align 4, 0x90.type main,@functionmain: ret.size main, .-main

The main thing that we want to determine in these target-object backend regression tests is whether the relocations inthe code emitted to memory were fixed up correctly. Meaning, we have to literally check whether certain bytes in certainmemory locations are what we expect them to be. Letâ€™s look at some more intricate test cases that will show thedifferent kinds of checks we might need to perform and how we can perform them.

// llvm-jitlink allows you to specify jitlink-check expressions.// jit-link check expressions are checks against working memory.// jit-link check expressions can be used with the `decode_operand` function.// `decode_operand` decodes the instruction at the given label// and then accesses the operand number that you have specified.//// For the expression below, decode operand decodes the operand at the// label `foo`, accesses its 0th operand `external_data` and checks whether// its value is equal to the bytes represented by `0xDEADBEEF`.//// Note - The operand number does not always have a one-to-one mapping// with what you see and while in this case `external_data` was indeed the// 0th operand of the instruction, for another instruction its operand// number may have been different.# jitlink-check: decode_operand(foo, 0) = 0xDEADBEEF .globl foo  .p2align 4, 0x90 .type foo,@functionfoo: movl external_data, %eax.size foo, .-foo// The RHS of jitlink-check expressions doesn't have to be literal// bytes. It can be an expression of labels and other functions over// labels.//// In the below jitlink-check expression, the RHS// is calculating the difference between the address of the label// `foo` and the address of the program counter when the instruction at label// `bar` is executed.# jitlink-check: decode_operand(bar, 0) = foo - next_pc(bar) .globl bar .p2align 4 .type bar,@functionbar: calll foo.size bar, .-bar// The `got_addr`function can also be used on the RHS, to access the// address of the GOT entry of a symbol.//// In the below jitlink-check expression, the RHS is calculating the// offset between the GOT entry for the symbol `named_data` and the// GOT symbol itself.# jitlink-check: decode_operand(test_got, 4) = got_addr(test_file_name.o, named_data) - _GLOBAL_OFFSET_TABLE_ .globl test_got .p2align 4, 0x90 .type test_got,@functiontest_got: leal named_data@GOT, %eax.size test_got, .-test_got// The LHS of a jitlink-check expression, can also be constructed manually// by "casting" a symbol, label or a function over a label to a machine register// size pointer.//// In the below jitlink-check expression the LHS is constructed by casting the// address of the GOT entry for `named_data` to a 32-bit pointer. The constructed// pointer is then dereferenced and compared against the `named_data` label.# jitlink-check: *{4}(got_addr(test_file_name.o, named_data)) = named_data .globl test_got .p2align 4, 0x90 .type test_got,@functiontest_got: leal named_data@GOT, %eax.size test_got, .-test_got// NOTE - The above presented flavors of jitlink-check expressions is not an// exhaustive list of what's available. Rather it's just a summarization of some// of the ways in which I used jitlink-check expressions.

Recap and conveniences

We covered a lot of ground there. Letâ€™s quickly recap the things we talked about.

We established the context required to understand the project. We defined basic concepts - linking and JITlinking, and talked about the need for JIT linking and LLVM JITLink.
We established an understanding of what the project was.
We went over the execution details of the project. We talked about important high level constructs, the high level JIT linking algorithm used by LLVM JITLink, setting up a testing loop for constant feedback and the details of each relocation that was added as part of the i386/ELF backend.
Finally, we talked about the tools and utilities that can be used to write regression tests for the project.

Resources

Below is an index of resources that I found useful (I may have mentioned them elsewhere in the post as well).

Chris Kanichâ€™s videos for the systems programming course at University of Illinois, Chicago
Lang Hames’ videos (1, 2, and 3) on LLVM ORC APIs and JITLink. Thesevideos were extremely valuable in understanding JITLink’s raison d’Ãªtre and the context in which it is used.
Linkers and Loaders by John R. Levine
Oracleâ€™s Linker and Libraries guide
LLVM JITLink documentation
LLVM testing infrastructure guide
Articles by Eli Bendersky on position-independent code and load time relocation of sharedlibraries.

Development conveniences

Dev setup
1. Thereâ€™s detailed information about that on the getting started with LLVM page.
2. If youâ€™re okay using AWS EC2 for development you can create an instance using my public machine image.
  1. The image id is ami-00f1c534fe06c05a0. You can use the instructions here to boot your instance usingthis image.
  2. The instance comes with all the basic tools and softwares that you need to start contributing to LLVM - Clang, CMake, Python, zlib, ninja, git, php, arc.
Build system
1. Use ninja! It is way faster than make.
2. If you want to build a llvm-jitlink binary, which you likely will for testing, just run ninja llvm-jitlinkfrom your build folderâ€™s root. This will avoid building other targets that you do not need and will complete much faster.
3. Other than that Iâ€™m going to refrain from saying too much here because oneâ€™s build system configuration rarely works on anotherâ€™s machine.
Files you will be dealing with
1. Youâ€™ll likely be dealing with files under llvm/lib/ExecutionEngine/JITLink.
2. The introductory commit for the i386/ELF and AArch64/ELF backends will giveyou a very good idea of what aminimal backend implementation looks like. Keep in mind that these first commits are for backends for an existing object format (ELF in this case). If you are adding support for a backend without an existing object format, you might need to ask for help (more on that in a bit).
Code reviews
1. Install arcanist - the tool for creating code reviews when contributing to LLVM.
2. Once installed the main commands you will be using are (assuming you are using git as your version control) -
  1. arc diff - To create code reviews and new revisions.
  2. arc land - To close code reviews once they are approved and push changes to remote.
Asking for help
1. #jit channel of the LLVM discord server.

Closing thoughts

And thatâ€™s about it! I donâ€™t have too much more to say on top of whatâ€™s already been said. It was a great learningexperience contributing to LLVM JITLink. Iâ€™d recommend it to anyone who wants to understand the story after compilationand up until a program is run - come say hi on the #jit channel of the LLVM discord server!

Iâ€™d also like to give a big shout out to the folks on the #jit channel for helping me understand things and answering myquestions. And a special thanks to Lang Hames for his help throughout the project andreviewing this post (thanks to Stefan GrÃ¤nitz and Vassil Vassilevfor reviewing as well)!

I plan on continuing my involvement with LLVM and JITLink, and am excited to see what I pick up next!

Appendix

What are GOT and PLT?

The GLOBAL_OFFSET_TABLE (GOT) and PROCEDURE_LINKAGE_TABLE (PLT) are 2 tables used during the linking process thatmake dynamic linking work. Dynamically linked code needs to be position-independent⁶, meaning that it should beable to be loaded at any address in memory and continue to work - all symbols that it references and all referencedsymbols that it contains, must be resolvable. Dynamically linked shared libraries can fulfill this position independencerequirement for internal symbols using pc-relative addressing, as the code of the shared libraries stays together inmemory. However, these libraries may also refer to external symbols or contain symbols, referenced by other sharedlibraries or the main executable. And since the load address of shared libraries, in memory, is not fixed, they requireanother layer of abstraction for the resolution of external symbols, for both data and functions. GOT and PLT are thoseabstractions.

Below are some simple visual examples to understand the GOT and PLT.

Data symbol access through GOT

Function symbol access through PLT
1. At load time
  1. Call to x is generated via the PLT.
  2. There is also an entry for x in the GOT, whose purpose will become clear in a minute.
1. At first invocation time
  1. Control jumps to PLT[1]
  2. First instr of PLT[1] transfers control to *GOT[3] (the address stored in GOT[3]).
    1. Remember the address stored in GOT[3] is that of the 2nd instr of PLT[1].
    2. Wait, so we go all the way to the PLT, to then go all the way to the GOT only to come back to the next instr in PLT[1]. We would have come there anyways as the processor processed instructions sequentially. Why did we take this roundabout route?
    3. Weâ€™ll see in a second!
  3. Once weâ€™re on the 2nd instr of PLT we push a value on the stack (what this value means is not super important here).
  4. Then control jumps to PLT[0], from where you can see it eventually jumps to the address stored in GOT[2].
    1. And whose address is stored in GOT[2]? The dynamic linkerâ€™s!
  5. The dynamic linker then goes ahead and links x (the external lib function we called) into the process and fixes the address in GOT[3] (which was initially the address of the 2nd instr of PLT[1]) to the new address of x in the process.
  6. The dynamic linker then invokes x, which is what the user wanted to do in the first place.
1. Subsequent invocations
  1. The first invocation of x looked like this -
    1. PLT[1] â†’ *GOT[3] â†’ PLT[1] â†’ PLT[0] â†’ *GOT[2] â†’ x()
  2. However, since the dynamic linker now fixed the entry in GOT[3] to reflect xâ€™s address, subsequent invocations of x look as below -
    1. PLT[1] â†’ *GOT[3] (which is essentially calling x, since that is what GOT[3] stores now)

With the above process dynamic linking enables us to call a functions in position-independent code. Additionally, itmakes the common case (every invocation of x other than the first) faster!

Footnotes

AOT, statically compiled languages - C, C++, Rust, unlike interpreted languages, such as Java, do not have aruntime that can be extended to bring in new symbols at runtime and perform symbol resolution for them. Java, forinstance, has the Java Virtual Machine (JVM) whose loading and linking behavior can be customized to achievethe aforementioned task. ↩︎
JIT-linking is primarily useful in the context of linking in pre-compiled languages (that’s certainly whatinspired it), but it’s not only useful in that context. In LLVM JITLink, through the JITLinkContext you canlink against other (non-statically compiled) code, so it’s useful for anyone who wants to interoperate with C/C++ that’s linked at runtime. You could also theoretically bring up a purely JIT’d language with it (and I think Julia does this). The advantages are interoperability with existing languages, compilers, tools, and the disadvantage is that it’s heavyweight compared to a custom JIT that manages its own linking directly. ↩︎
Clang-REPL is an effort to move Cling, which is a standalone tool, into the LLVM infrastructure. ↩︎
In fact, given a suitable JITLinkContext, JITLink can even link objects into a different process. LLDB uses thiscapability (via LLVMâ€™s older MCJIT APIs) to JIT-link expressions in the debugger, but run them in the process being debugged, which may be on a different machine. ↩︎
I found the following 2 resources very useful for understanding cross-compilation.
1. â€œCross-platform Compilationâ€ chapter of the Getting Started with LLVM Core Libraries book.
2. Clang documentation for cross-compiling.
↩︎
Dynamically linked code can actually use the “static” relocation model, but the position-independent model isgenerally preferred. In position-independent code you only need to fix up the GOT, whereas in static code you need to fix up every external reference, which can hurt launch times. ↩︎

The 2023 EuroLLVM Developers' Meeting Program

Mon, 27 Mar 2023 00:00:00 +0000

The LLVM Foundation is excited to announce the 2023 EuroLLVM Developers’ Meeting program! Early bird registration ends April 10th.

Keynotes:

Order out of Chaos, The LLVM Release Process. - Tobias Hieta
â€œ-fbounds-safetyâ€: Enforcing bounds safety for production C code - Yeoul Na

Technical Talks:

An example of data flow analysis in MLIR - Tom Eccles
MLIR-based offline memory planning and other graph-level optimizations for xcore.ai - Deepak Panickal
A Rusty CHERI: The path to hardware capabilities in Rust - Lewis Revill
Extending the AArch32 JITLink backend - Stefan GrÃ¤nitz
Using MLIR to Optimize Basic Linear Algebraic Subprograms - Steven Varoumas
Buddy Compiler: An MLIR-based Compilation Framework for Deep Learning Co-design - Hongbin Zhang
MachineScheduler - fine grain resource allocation using resource intervals. - Francesco Petrogalli
Inliner in MLIR - Javed Absar
How to use llvm-debuginfo-analyzer tool. - Carlos Alberto Enciso
Practical Global Merge Function with ThinLTO - Kyungwoo Lee
Prototyping MLIR in Python - Sasha Lopoukhine, Mathieu Fehr
Extensible and Composable Dataflow Analysis in MLIR - Jeff Niu
What would it take to remove debug intrinsics? - Jeremy Morse
Compiling Ruby (with MLIR) - Alex Denisov
Whatâ€™s new in MLIR? - Mehdi Amini
Structured Bindings and How to Analyze Them - DomjÃ¡n DÃ¡niel
MLIR Dialect Design and Composition for Front-End Compilers - Jeff Niu
ML-LLVM-Tools: Towards Seamless Integration of Machine Learning in Compiler Optimizations - S. VenkataKeerthy, Siddharth Jain, Umesh Kalvakuntla
Optimizing the Linux Kernel with LLVM BOLT - Maksim Panchenko
mlir-meminfo : A Memory Model for MLIR - Kunwar Grover,Arjun Pitchanathan

Tutorials:

Developing BOLT pass - Amir Ayupov
A whirlwind tour of the LLVM optimizer - Nikita Popov
Tutorial: Controllable Transformations in MLIR - Alex Zinenko
GlobalISel by example - Alex Bradbury

Quick Talks:

Iterative Compilation - Give the compiler a second chance - Ziv Ben Zion
Another level of indirection - Compiler Fallback of load/store into gather/scatter enhance compiler robustness by overcoming analysis and hardware limitations - Omer Aviram
Switch per function in LLVM - Tomer Nissim Schneider
Tensor Evolution - An ML Graph Optimization Technique - Javed Absar,Muthu Baskaran
ML-on-CPU: should vectorization happen in the LLVM backend or higher up the stack? - Elen Kalda
CORE-V LLVM: Adding eight vendor extensions to standard RISC-V LLVM - Charlie Keaney,Chunyu Liao (å»–æ˜¥çŽ‰),Lewis Revill
Advanced Bug Reports: Choose Your Own Adventure - Arseniy Zaostrovnykh
Multiple-Entry, Multiple-Exit MLIR Regions - Jeff Niu
Target-Independent Integer Arithmetic - Jeff Niu
Improving Vectorization for Loops with Control Flow - Ashutosh Nema
How to run the LLVM-Test Suite on GPUs and what youâ€™ll find - Johannes Doerfert
OpenMP as GPU Kernel Language - Johannes Doerfert

Lightning Talks:

LLVM IR as an Embedded Domain-Specific Language - Nikita Baksalyar
Using MLIR for Dalvik Bytecode Analysis - Eduardo BlÃ¡zquez
High school studentâ€™s experience with Clang - Yubo Hui
Spot the Difference with LLVM-FLOW: an open-source interactive visualization tool for comparing IR CFGs - Jinmyoung Lee
Leveraging MLIR for Better SYCL Compilation - Victor LomÃ¼ller
Arm/AArch64 Embedded Development with LLD : Whatâ€™s New - Amilendra Kodithuwakku
Using automated tests to tune the -Og pipeline - Stephen Livermore-Tozer
Buddy-CAAS: Compiler As A Service for MLIR - Hongbin Zhang
llvm-buildmark - observations, tips, and tricks on reducing LLVM build times - Alex Bradbury
Lock Coarsening optimizations for loops in Java - Anna Thomas

Student Technical Talks

Cost Modelling for Register Allocation and Beyond - Aiden Grossman
A Template-Based Code Generation Approach for MLIR - Florian Drescher
MLIR Query Tool for easier exploration of the IR - Devajith Valaparambil Sreeramaswamy
mlirSynth: Synthesis of Domain-Specific Programs in MLIR - Alexander Brauckmann
Image Processing Ops as first class citizens in MLIR: write once, vectorise everywhere! - Prathamesh Tagore,Hongbin Zhang
Using the Clang data-flow framework for null-pointer analysis - Viktor Cseh
Fast and Vectorized Pivot Function for MLIR Presburger Library - Qi
RISC-V Vector Extension Support in MLIR: Motivation, Abstraction, and Application - Hongbin Zhang

Posters:

Automatic Translation of C++ to Rust - Henrique Preto
A sustainable approach to have vector predication in the Loop Vectorizer - Lorenzo Albano
Performance Analysis of Undefined Behavior Optimizations - Lucian Popescu
Static Analysis for C++ Rust-Like Lifetime Annotations - Susana Monteiro
Leveraging MLIR for Better SYCL Compilation - Victor LomÃ¼ller
Forcefully Embedding MLIR into Python - George Mitenkov

We would also like to thank the program committee:

Kristof Beyls (Chair), Alex Bradbury, Alex Denisov, Anupama Chandrasekhar, David Spickett, Florian Hahn, Gabor Horvath,Hans Wennborg, Jakub Kuderski, Jonathan Springer, Jubi Taneja, Mehdi Amini, Michal Paczkowski, Min-Yih Hsu, Nadav Rotem, Paul Kirth, Petr Hosek, Quentin Colombet, Renato Golin, Stephen Neuendorffer, Timothy Harvey,and Tobias Grosser.

Interactive programming in C++ internships â€” LLVM Blog Post

Wed, 21 Dec 2022 00:00:00 +0000

Another program year is ending and our Compiler Researchteam is extremely happy to share the hard work and the results of our internscontributors!

The Compiler Research team includes researchers located at Princeton Universityand CERN.Our primary goal is research into foundational software tools helping scientiststo program for speed, interoperability, interactivity, flexibility, andreproducibility. We work in various fields of science such as high energyphysics, where research is fundamentally connected to software at exabyte scale.We develop computational methods and research software for scientificexploration and discovery. Our current research focuses on three main topics:interpretative C/C++/CUDA, automatic differentiation tools, and C++ language interoperability with Pythonand D.

This year, we had six participants who worked on seven projects, covering a widerange of topics, from LLVMâ€™s new JIT linker(JITLink) toClang-Repl,an interactive C++ interpreter for incremental compilation integratedin Clang. All projects rise toward acommon, ambitious goal: to establish a proficient workflow in LLVM,where interactivedevelopment in C++ is possible, and exploratory C++ becomes an accessibleexperience to a wider audience.

Below you can find the list of projects from our interns, an overview of theobjectives and results. We invite you to follow the links to access a moredetailed description of each project.

JITLink support for a new format/architecture (ELF/AARCH64)

Developer: Sunho Kim(Computer Science, De Anza College, Cupertino, California)

Mentors: Stefan GrÃ¤nitz (Freelance Compiler Developer, Berlin,Deutschland),Lang Hames (Apple), Vassil Vassilev (Princeton University/CERN)

Funding: Google Summer of Code 2022

Suhno developed a JITLinkspecialization that extends JITLinkâ€™s generic linker algorithm, allowingJITLink to support ELF/aarch64 target and provides full support of all advancedPE/COFF object file features. By supporting the ELF/aarch64 target, JITLink cannow be used in Julia, while COFF/X86_64 enables Microsoft Visual C++ (MSCV)target in Clang-Repl.Here you can find Sunhoâ€™s GSoCfinal report.

Hurray! This project has been accepted as a Tutorial at the2022 LLVM Developersâ€™ Meeting!The conference took place in San Jose, California, from 7th to 10th November2022.Here you can find thevideoand the slidesof Sunhoâ€™s presentation.

The project was also presented at the Compiler As a Service (CaaS) monthlymeeting. Here you can find the videoand the slidesof Sunhoâ€™s presentation.

Extend Clang to resugar template specialization accesses

Developer: Matheus Izvekov

Mentors: Richard Smith (Google), Vassil Vassilev (PrincetonUniversity/CERN)

Funding: Google Summer of Code 2022

Clangâ€™s type system was optimized bypushing type-syntactic-sugar on the arguments of a template specialization intothose member accesses. This was achieved by: 1. creating a new type node intothe Abstract Syntax Tree (AST) that represents the sugar of a member access ina template specialization; and 2. implementing single step desugaring logicwhich will perform the substitution of template parameter sugar into thecorresponding specialization argument sugar. As a result, we improved Clangâ€™sdiagnostic system by enabling other constructs to preserve type sugar, allowingboth for their representation when present in the specialization argument.Here you can find Matheus’ GSoCfinal report.

Hurray! This project has been accepted as a Lightning Talk at the2022 LLVM Developersâ€™ Meeting!The conference took place in San Jose, California, from 7th to 10th November2022.Here you can find the videoand theslidesof Matheusâ€™ presentation.

Recovering from Errors in Clang-Repl and Code Undo

Developers: Jun Zhang(Software Engineering, Anhui Normal University, WuHu, China) andPurva Chaudhari(California State University Northridge, Northridge CA, USA)

Mentors: Vassil Vassilev (Princeton University/CERN)

Clang-Repl is aninteractive C++ interpreter integrated in Clang, which enables incrementalcompilation. It supports interactive programming for C++ in aRead-Eval_Print Loop(REPL)style, compiling the code just-in-time with a JIT approach that reduces thecompile-run cycles. We added the â€œundoâ€ functionality to Clang-REPL, and weimproved error-recovery by adding the possibility to recover the low-levelexecution infrastructure. As a result, Clang-REPL now supports reversalexecution and recovery actions on behalf of a user in an automatic andconvenient way.

Shared Memory Based JITLink Memory Manager

Developer: Anubhab Ghosh(Computer Science and Engineering, Indian Institute of Information Technology, Kalyani, India)

Mentors: Stefan GrÃ¤nitz (Freelance Compiler Developer, Berlin, Deutschland),Lang Hames (Apple), Vassil Vassilev (Princeton University/CERN)

Funding: Google Summer of Code 2022

Anubhab introduced a new JITLinkMemoryManagerbased on a MemoryMapper abstraction that is capable of allocating JIT code(and data) using shared memory. The following advantages should arise from theimplemented strategy: 1. a faster transport (and access) for code and data whenrunning JITâ€™d code in a separate process on the same machine, and 2. theguarantee that all allocations are close together in memory and meet theconstraints of the default code model allowing the use of outputs from regularcompilers.Here you can find Anubhabâ€™s GSoCfinal report.

Optimize ROOT use of modules for large codebases

Developer: Jun Zhang(Software Engineering, Anhui Normal University, WuHu, China)

Mentors: David Lange (Princeton University), Alexander Penev(University of Plovdiv Paisii Hilendarski, Bulgaria), Vassil Vassilev(Princeton University/CERN)

Funding: Google Summer of Code 2022

The current performance of the modules usage in ROOTwas evaluated, and a strategy for optimizing the memory footprint was developed.This strategy includes reducing unnecessary symbol-lookups and module-loading,and is especially useful for very large codebases like CMSSW,where Jun reduced the amount of memory by half (from 1.1GB to 600MB) for simpleworkflows like hsimple.C, and the number of loaded from 180 to 52.Here you can find Junâ€™s GSoCfinal report.

Add Initial Integration of Clad with Enzyme

Developer: Manish Kausik H(.Tech and M.Tech in Computer Science and Engineering(Dual Degree),Indian Institute of Technology Bhubaneswar)

Mentors: David Lange (Princeton University), William Moses(Massachusetts Institute of Technology), Vassil Vassilev (Princeton University/CERN)

Funding: Google Summer of Code 2022

Clad is an opensource plugin to the Clang compiler that enables Automatic Differentiationfor C++. Clad receives an Abstract Syntax Tree (AST) from the underlying
compiler platform (Clang), decides whether a derivative is requested andproduces it, and modifies the AST to insert the generated code.Enzyme AD is an LLVM based AD plugin.It works by taking existing code as LLVM IR and computing the derivative(and gradient) of that function.In this project, Manish integrated Enzyme within Clad, giving a Clad user theoption of selecting Enzyme for Automatic Differentiation on demand. Theintegration of Clad and Enzyme t results in a new tool that offers an optimizedand flexible implementation of automatic differentiation.Here you can find Manishâ€™s GSoCfinal report.

Thank you, developers!

We hope our interns contributors enjoyed our community. Our best reward is toknow that we supported your early steps into the world of open-source softwaredevelopment and compiler construction. We hope that this experience motivatedyou to continue to be involved with the broad LLVM ecosystem.We express our gratitude to the Google Summer of CodeProgram and to the Institute for Research and Innovation in Software for HighEnergy Physics (IRIS-HEP) for supporting ourresearch and providing six young developers this wonderful opportunity.A special thanks goes to the LLVM community for being supportive and for sharingtheir knowledge and introducing our community to the new contributors. Thank youso much!Thanks for choosing Compiler-Research for your internship! We were lucky tohave you!

Office hours and the LLVM community calendar.

Thu, 27 Oct 2022 00:00:00 +0000

About a thousand people contribute code to LLVM each year. There are probablymany thousands who work on the LLVM code base in downstream projects. And evenmore people use the LLVM libraries to build other cool projects on top of.

The many LLVM users and contributors have long been communicating with eachother using mailing lists, bugzilla, and more recently,Discourse and githubissues. Next to theseasynchronous communication channels, the LLVM community has also long had moresynchronous communication channels, such asIRC andDiscord. Twice per year, there is the opportunityto communicate synchronously face-to-face with other LLVM-ers from around theworld: at the LLVM developer meetings. The LLVMsocials that happen in various places aroundthe world also provide the opportunity to meet face-to-face with LLVMers inyour area.

In recent years, we’ve added a number of online “face-to-face” synchronouscommunication channels. They often are the best way to make progress quickly.I think many don’t know of their existence yet, or just need a littleencouragement to make more use of them. Therefore, I thought it was worthwhileto highlight them in this post:

Online sync-upsare regular calls on specific topics. A few of them have been running foryears, but most started only in the last 2 years. A full list of currentonline sync-ups is documented athttps://llvm.org/docs/GettingInvolved.html#online-sync-ups
More recently, “officehours” havestarted. In these, specific experienced people in the community makethemselves available for a chat and to answer questions on anything in theirarea of expertise. They are documented athttps://llvm.org/docs/GettingInvolved.html#office-hours. Whether you’reexperienced or not in LLVM, both online sync-ups and office hours can be agreat way to get advice and help on any issues you might be facing when usingor contributing to LLVM.

One last way to find LLVM events that may be interesting to you, is to look atour [email protected]">LLVM communitycalendar.It shows the schedule for most office hours, online sync-ups and socials. Ifyou’re an organizer of any of these events, please do not forget to add yourevent to the community calendar by [email protected].

Announcing the LLVM Foundation Board of Directors for the 2022-2024 term

Mon, 03 Oct 2022 00:00:00 +0000

The LLVM Foundation would like to announce our Board of Directors for the 2022-2024 term:

Kit Barton (Secretary)
Kristof Beyls
Chris Bieneman
Mike Edwards (Treasurer)
Reid Kleckner
Anton Korobeynikov
Chris Lattner
Tanya Lattner (President)
Wei Wu

Three new members and six continuing members were elected to the nine person board.

Thank you to retiring board members Tom Stellard, Cyndy Ishida, and Hal Finkelfor all of their contributions to the board!

About the board of directors (listed alphabetically by last name):

Kit Barton

Kit Barton has been contributing to LLVM since 2015. His contributions haveprimarily been to the PowerPC backend and loop optimizations including the loopfusion pass. He has presented multiple technical talks, and tutorials at theLLVM Dev conferences over the last several years.

In addition to the contributions to LLVM, over the last several years Kit hasdriven the effort within IBM to migrate their proprietary C/C++ and Fortrancompilers to leverage LLVM technology. He is currently the technical lead forC/C++ and Fortran compilers on Power and z/OS at IBM.

[email protected]">[Email]

Kristof Beyls

Kristof Beyls has worked on LLVM since about 2010, initially as part of tech leading the migration of Armâ€™s C/C++ toolchain to be based on LLVM technology. Since then, Kristof has worked on a variety of code generation projects using LLVM. He has contributed to LLVM in the areas of security mitigations, performance tuning, Arm backends, test-suite, LNT, etc.He has been helping with the organization of EuroLLVM meetings since the start; has been organizing the FOSDEM LLVM dev rooms for the past couple of years and has organized a few socials in Belgium. He has also been on the program committee for a few of the dev meetings. More recently, he works on progressing LLVM relicensing, getting LLVM office hours going and having an LLVM community calendar.Kristof is Senior Principal Engineer at Arm.

[Twitter] [GitHub] [LinkedIn] [email protected]">[Email]

Chris Bieneman

Chris has been an LLVM contributor since 2013 with contributions up and down the monorepo. His work has ranged from the frontend to the backend, to debuggers, linkers, and JITs.

Chris has presented at several LLVM events and is passionate about community building. The LLVM project and community have been enormous influences in his life, and he is looking forward to the opportunity to work with the Foundation and community to bring that experience to more people around the world.

Chris is currently an engineer on the HLSL compiler team at Microsoft working on advancing graphics programming for Xboxand DirectX and bringing HLSL support into Clang.

[Twitter] [GitHub] [[email protected]">Email

beanz on LLVM’s Discourse, Discord, and IRC.

Mike Edwards

Mike got involved with the LLVM project back in 2014 by contributing to the infrastructure projects. He has helped with everything from CI to the transition to GitHub.

Mike has served as the LLVM Foundations Treasurer since 2018. Mike cares very much about the mission of the LLVM Foundation and credits many of his career opportunities as a direct result of being involved with the LLVM community. He is looking forward to helping to build opportunities for many others to benefit from programs the Foundation has to offer.[email protected]">[Email]

Reid Kleckner

Reid Kleckner began contributing to LLVM in 2009 while working on Unladen Swallow, a JIT for Python. That project inspired him to go deeper in compilers. After a long detour involving the DynamoRIO and DrMemory projects, in 2013, Reid joined a team at Google working on Clang and MSVC compatibility. This project required making wide-ranging changes across the compiler: frontend parser changes, Microsoft C++ ABI features, fixes for Windows calling conventions, new IR features such as musttail, parts of the Windows exception handling representation, and parts of the CodeView/PDB debug info format support.

Aside from direct code contributions, Reid has given two talks at US LLVM developer meetings and served on the program committee once. He looks forward to improving the community experience for the next generation of LLVM contributors.Reid currently manages a team supporting C++ and LLVM at Google.

[Twitter] [GitHub] [email protected]">[Email]

Anton Korobeynikov

Anton Korobeynikov began contributing to the LLVM project in 2006.Over the years, he has numerous technical contributions to differentareas including Windows support, ELF features, debug info, exceptionhandling, and backends such as ARM and x86.He was the original author of the MSP430 and original System Z backend.

In addition to his technical contributions, Anton has maintainedLLVMâ€™s participation in Google Summer of Code by managingapplications, deadlines, and overall organization. He also supportsthe LLVM infrastructure, drove the Bugzilla to GtiHub migration andhas been on several program committees for the LLVM Developersâ€™Meetings (both US and EuroLLVM).

Anton is currently an associate professor at the Saint PetersburgState University and has served on the LLVM Foundation board ofdirectors for the last 8 years.

[Twitter] [GitHub] [Linkedin] [email protected]">[Email] [Website]

Chris Lattner

Chris Lattner cofounded the LLVM Compiler infrastructure project, the Clang compiler, the Swift programming language, the MLIR compiler infrastructure, the CIRCT project, and has contributed to many other commercial and open source projects at Apple, Tesla, Google and SiFive. He is currently Cofounder and CEO of Modular, which is building an innovating new developer platform for AI Software.

[Twitter] [GitHub] [LinkedIn] [Website] [email protected]">[Email]

Tanya Lattner

Tanya Lattner has been involved in the LLVM project for over 20 years. She began as a graduate student who wrote her master’s thesis using LLVM, and continued on using and extending LLVM technologies at various jobs during her career as a compiler engineer.

Tanya is also a long time volunteer with the LLVM Project. She has organized LLVM Developersâ€™ Meetings, workshops, was the release manager for several years, and adminstrates LLVM infrastructure.

With the support of the initial board of directors, Tanya created the LLVM Foundation, defined its charitable and education mission, and worked to get 501(c)(3) status. She is passionate about the LLVM Community and wants to help see it thrive and grow for years to come.

Tanya is the COO and President of the LLVM Foundation.

[Twitter] [GitHub] [LinkedIn] [email protected]"> [Email]

Wei Wu

Wei Wu is the co-founder and director of the PLCT lab, ISCAS. Heformed an LLVM development team in the PLCT lab which is activelycontributing code to LLVM upstream (currently mainly RISC-V backendand MLIR framework). He is also serving the LLVM IWG as a volunteer,trying to improve the LLVM’s infrastructure.

He is the chairman of OSDT (including HelloGCC and HelloLLVM)community which was founded in 2007. OSDT is an open community aimingto promote the development of toolchains, language VMs,emulators/simulators and other tools for software developers. Heestablished the HelloLLVM community in China a few years ago,dedicated to attracting and cultivating new LLVM developers. HelloLLVMis now a sub-community of OSDT.

He is very passionate about teaching compiler techniques to Chineseengineers and amateurs. He often gives talks and lectures amongamateurs and college students in OSDT meetups. He also collaborateswith college teachers, providing LLVM related seminars and experimentcourses.

[email protected]"> [Email]

Announcing the 2022 LLVM Developers' Meeting Program

Fri, 30 Sep 2022 00:00:00 +0000

We had an amazing group of talk proposals submitted for the 2022 LLVM Developers’ Meeting. Thank you to all that submitted a talk proposal this year!

Here is the 2022 LLVM Developers’ Meeting program:

Keynotes:

Paths towards unifying LLVM and MLIR - Nicolai HÃ¤hnle
Implementing Language Support for ABI-Stable Software Evolution in Swift and LLVM - Doug Gregor

Technical Talks:

Implementing the Unimplementable: Bringing HLSL’s Standard Library into Clang - Chris Bieneman
Heterogeneous Debug Metadata in LLVM - Scott Linder
Clang, Clang: Who’s there? WebAssembly!- Paulo Matos
MC/DC: Enabling easy-to-use safety-critical code coverage analysis with LLVM - Alan Phipps
What does it take to run LLVM Buildbots? - David Spickett
llvm-gitbom: Building Software Artifact Dependency Graphs for Vulnerability Detection - Bharathi Seshadri, Yongkui Han
CuPBoP: CUDA for Parallelized and Broad-range Processors - Ruobing Han
Uniformity Analysis for Irreducible CFGs - Sameer Sahasrabuddhe
Using Content-Addressable Storage in Clang for Caching Computations and Eliminating Redundancy - Steven Wu, Ben Langmuir
Direct GPU Compilation and Execution for Host Applications with OpenMP Parallelism - Shilei Tian, Joseph Huber
Linker Code Size Optimization for Native Mobile Applications - Gai Liu
Minotaur: A SIMD Oriented Superoptimizer - Zhengyang Liu
ML-based Hardware Cost Model for High-Level MLIR - Dibyendu Das, Sandya Mannarswamy
VAST: MLIR for program analysis of C/C++ - Henrich Lauko
MLIR for Functional Programming - Siddharth Bhat
SPIR-V Backend in LLVM: Upstream and Beyond - Michal Paszkowski, Alex Bezzubikov
IRDL: A Dialect for dialects - Mathieu Fehr, ThÃ©o Degioanni
Automated translation validation for an LLVM backend - Nader Boushehrinejad Moradi
llvm-dialects: bringing dialects to the LLVM IR substrate - Nicolai HÃ¤hnle
YARPGen: A Compiler Fuzzer for Loop Optimizations and Data-Parallel Languages - Vsevolod Livinskii
RISC-V Sign Extension Optimizations - Craig Topper
Execution Domain Transition: Binary and LLVM IR can run in conjunction - Jaeyong Ko

Tutorials:

Using LLVM’s libc - Sivachandra Reddy, Michael Jones, Tue Ly
How to implement a new JITLink backend in a week - Sunho Kim

Panels (some speakers still to be announced):

Machine Learning Guided Optimizations (MLGO) in LLVM
Static Analysis in Clang - Gabor Horvath, Artem Dergachev, Bruno Cardoso Lopes
High-level IRs for a C/C++ Optimizing Compiler - Bruno Lopes, Ivan Baev, Johannes Doerfert, Mehdi Amini
Panel discussion on â€œBest practices with toolchain release and maintenanceâ€ - Aditya Kumar

Student Technical Talks:

Merging Similar Control-Flow Regions in LLVM for Performance and Code Size Benefits - Charitha Saumya
Alive-mutate: a fuzzer that cooperates with Alive2 to find LLVM bugs - Yuyou Fan
Enabling Transformers to Understand Low-Level Programs - Zifan Guo, William S. Moses
LAGrad: Leveraging the MLIR Ecosystem for Efficient Differentiable Programming - Mai Jacob Peng
Scalable Loop Analysis - Vir Narula

Quick Talks:

LLVM Education Initiativei - Chris Bieneman, Kit Barton, Mike Edwards
Enabling AArch64 Instrumentation Support In BOLT - Elvina Yakubova
Approximating at Scale: How strtofloat in LLVMâ€™s libc is faster - Michael Jones
MIR support in llvm-reduce - Matthew Arsenault
Interactive Crashlogs in LLDB - Med Ismail Bennani
clang-extract-api: Clang support for API information generation in JSON - Zixu Wang
Using modern CPU instructions to improve LLVM’s libc math library. - Tue Ly
Challenges Of Enabling Golang Binaries Optimization By BOLT - Vasily Leonenko, Vladislav Khmelevskyi
Inlining for Size - Kyungwoo Lee, Ellis Hoag, Nathan Lanza
Automatic indirect memory access instructions generation for pointer chasing patterns - PrzemysÅ‚aw Ossowski
Link-Time Attributes for LTO: Incorporating linker knowledge into the LTO recompile - Todd Snider
Expecting the expected: Honoring user branch hints for code placement optimizations - Stan Kvasov, Vince Del Vecchio
CUDA-OMP â€” Or, Breaking the Vendor Lock - Johannes Doerfert, Joseph Huber
Thoughts on GPUs as First-Class Citizens - Johannes Doerfert, Shilei Tian, Joseph Huber
Building an End-to-End Toolchain for Fully Homomorphic Encryption with MLIR - Alexander Viand

Lightning Talks:

LLVM Office Hours: addressing LLVM engagement and contribution barriers - Kristof Beyls
Improved Fuzzing of Backend Code Generation in LLVM - Yuyang Rong
Interactive Programming for LLVM TableGen - David Spickett
Efficient JIT-based remote execution - Anubhab Ghosh
FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform Libraries - Yifei He
Recovering from Errors in Clang-Repl and Code Undo - Purva Chaudhari, Jun Zhang
10 commits towards GlobalISel for PowerPC - Kai Nacke, Amy Kwan
Nonstandard reductions with SPRAY - Jan Hueckelheim, Johannes Doerfert
Type Resugaring in Clang for Better Diagnostics and Beyond - Matheus Izvekov
Swift Bindings for LLVM - Egor Zhdan
Min-sized Function Coverage with IRPGO - Ellis Hoag, Kyungwoo Lee
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs in Polygeist/MLIR - William S. Moses, Ivan R. Ivanov
Tools for checking and writing non-trivial DWARF programs - Chris Jackson
Analysis of RISC-V Vector Performance Using MCA Tools - Michael Maitland
Optimizing Clang with BOLT using CMake - Amir Ayupov
Exploring OpenMP target offloading for the GraphCore architecture - Jose M Monsalve Daiz

Posters (more posters to be announced at a later date):

Removal of Undef: Move Uninitialized Memory to Poison - John McIver
Optimizing Julia’s ORC JIT - Prem Chintalapudi
An LLVM-Based Compiler for Quantum-Classical Applications - Xin-Chuan Wu
Specializing Code to New Architectures via Dynamic Adaptive Recompilation - Quinn Pham, Dhanrajbir Singh Hira
LLFPTrax: Tracking ill-conditioned floating-point inputs using relative error amplification in LLVM - Tanmay Tirpankar
LLVM continuous upstream integration and testing - Jay Azurin, Keerthana Subramani
Automatic indirect memory access instructions generation for pointer chasing patterns - Adam Perdeusz

Thank you to the volunters on the Program Committee for all of their hard work and time spent reviewing proposals. A special thanks also goes out to this year’s chair - Anton Korobeynikov. Here is the complete 2022 LLVM Developers’ Meeting Program Committee:

Kristof Beyls
Andrey Bokhanko
Chelsea Cassanova
Johannes Doerfert
Florian Hahn
Petr Hosek
Min-Yih Hsu
Anton Korobeynikov (Chair)
Aditya Kumar
Hem Neema
Diego Novillo
Fangrui Song
J. Ryan Stinnett
Caroline Tice
Mircea Trofin

Registration closes on October 31st, so register today for the 2022 LLVM Developers’ Meeting.

Text formatting in C++ using libc++

Tue, 27 Sep 2022 00:00:00 +0000

Historically formatting text in C++, using the standard library, has beenunpleasant. It is possible to get nice output with the stream operators, but itis very verbose due to the stream manipulators. The other option is usingprintf, which is convenient but only supports a limited number of types andis not extendable. A non-standard option is using the{fmt} library. This article provides ashort introduction to the parts of this library that were standardized in C++20as std::format, as well as the current implementation status in LLVM 15.

What is `std::format`

std::format is a text formatting library using format strings similar toPython’s formatand extensible for user defined types.

#include <format>#include <iostream>int main() { std::cout << std::format("Hello {} in C++{}", "std::format", 20);}

Writes the following output:

Hello std::format in C++20

The {} indicates a replacement field like % in printf. Withstd::format the argument types are known, so it is not required to specifythem in the replacement field.

The desired output format and the positional argument to use for eachreplacement field can also be specified.(For brevity, the following examples omit the required includes.)

Writes the first positional argument using different bases, a prefix, andzero padding to 8 columns.

int main() { std::cout << std::format("{0:#08b}, {0:#08o}, {0:08}, {0:#08x}", 16);}

0b010000, 00000020, 00000016, 0x000010

It is possible to use an upper case prefix and hexadecimal digits.

int main() { std::cout << std::format("{0:#08B}, {0:#08o}, {0:08}, {0:#08X}", 15);}

0B001111, 00000017, 00000015, 0X00000F

The alignment and fill character can be specified.

int main() { std::cout << std::format("{:#<8} {:*>8} {:-^5}", "Hello", "world", '!');}

Hello### ***world --!--

When printing tables it is nice to be able to specify the alignment and widthof the columns. However, formatting Unicode text can be especially tricky,since not every char (or wchar_t) is one “character”.

For example the letter Ã, can be written in two ways:

LATIN CAPITAL LETTER A WITH ACUTE
LATIN CAPITAL LETTER A + COMBINING ACUTE ACCENT

This combining of multiple “characters” is used in several scripts and inemojis. (This “combined multiple characters” is known asextended grapheme clusters inUnicode.) The library has implemented these rules so it will count both formsof Ã as using one column in the output.

Another issue with text formatting is that not all every “character” has thesame column width. Based on the “character” the column width is estimated to beone or two columns.

Below is an example taken from the paper thatintroduced the width estimation algorithm in std::format:

struct input { const char* text; const char* info;};int main() { input inputs[] = { {"Text", "Description"}, {"-----", "------------------------------------------------------------------------" "--------------"}, {"\x41", "U+0041 { LATIN CAPITAL LETTER A }"}, {"\xC3\x81", "U+00C1 { LATIN CAPITAL LETTER A WITH ACUTE }"}, {"\x41\xCC\x81", "U+0041 U+0301 { LATIN CAPITAL LETTER A } { COMBINING ACUTE ACCENT }"}, {"\xc4\xb2", "U+0132 { LATIN CAPITAL LIGATURE IJ }"}, // Ä² {"\xce\x94", "U+0394 { GREEK CAPITAL LETTER DELTA }"}, // Î” {"\xd0\xa9", "U+0429 { CYRILLIC CAPITAL LETTER SHCHA }"}, // Ð© {"\xd7\x90", "U+05D0 { HEBREW LETTER ALEF }"}, // × {"\xd8\xb4", "U+0634 { ARABIC LETTER SHEEN }"}, // Ø´ {"\xe3\x80\x89", "U+3009 { RIGHT-POINTING ANGLE BRACKET }"}, // ã€‰ {"\xe7\x95\x8c", "U+754C { CJK Unified Ideograph-754C }"}, // ç•Œ {"\xf0\x9f\xa6\x84", "U+1F921 { UNICORN FACE }"}, // ðŸ¦„ {"\xf0\x9f\x91\xa8\xe2\x80\x8d\xf0\x9f\x91\xa9\xe2\x80\x8d" "\xf0\x9f\x91\xa7\xe2\x80\x8d\xf0\x9f\x91\xa6", "U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466 " "{ Family: Man, Woman, Girl, Boy } "} // ðŸ‘¨â€ðŸ‘©â€ðŸ‘§â€ðŸ‘¦ }; for (auto input: inputs) { std::cout << std::format("{:>5} | {}\n", input.text, input.info); }}

(Note the column width is intended to look good on a terminal. The author hasobserved differences in quality of the output depending on the browser used.)

 Text | Description----- | -------------------------------------------------------------------------------------- A | U+0041 { LATIN CAPITAL LETTER A } Ã | U+00C1 { LATIN CAPITAL LETTER A WITH ACUTE } AÌ | U+0041 U+0301 { LATIN CAPITAL LETTER A } { COMBINING ACUTE ACCENT } Ä² | U+0132 { LATIN CAPITAL LIGATURE IJ } Î” | U+0394 { GREEK CAPITAL LETTER DELTA } Ð© | U+0429 { CYRILLIC CAPITAL LETTER SHCHA } × | U+05D0 { HEBREW LETTER ALEF } Ø´ | U+0634 { ARABIC LETTER SHEEN } ã€‰ | U+3009 { RIGHT-POINTING ANGLE BRACKET } ç•Œ | U+754C { CJK Unified Ideograph-754C } ðŸ¦„ | U+1F921 { UNICORN FACE } ðŸ‘¨â€ðŸ‘©â€ðŸ‘§â€ðŸ‘¦ | U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466 { Family: Man, Woman, Girl, Boy }

Attempting to format a value as the wrong type (e.g. formatting a string as anumber) will result in a compilation error, instead of a runtime error withprintf. Most of the major compilers provide a warning to try to detectincorrect format specifiers in printf, but this is not part of thespecification, and in particular embedded compilers often don’t provide thatwarning. In contrast, std::format is specified to produce a compilationerror, which is implemented in the library itself using C++20 constevalfunctions.

int main() { std::cout << std::format("{0:#08B}, {0:#08o}, {0:08}, {0:#08X}", "15");}

The compiler output starts with this error, followed by a lot of not too useful messages.

error: call to consteval function 'std::basic_format_string<char, const char (&)[3]>::basic_format_string<char[37]>' is not a constant expression std::cout << std::format("{0:#08B}, {0:#08o}, {0:08}, {0:#08X}", "15");

In addition to outputting the formatted result to a string, it is also possible:

to output the result to an arbitrary output iterator,

int main() { std::format_to( std::ostream_iterator<char>(std::cout, ""), "Hello {} in C++{}\n", "std::format", 20);}

to determine the output size,

int main() { std::cout << std::formatted_size("Hello {} in C++{}\n", "std::format", 20);}

or limit the size of the output.

int main() { std::format_to( std::ostream_iterator<char>(std::cout, ""), 11, "Hello {} in C++{}\n", "std::format", 20);}

Hello std

An example of formatting user defined types is available in the standardlibrary. It has formatting support for the chrono library. (This is notavailable in libc++ yet.) These formatters are quite complex. For other typesit is possible quickly create a formatter. For example for the following enum class:

enum class color { red, green, blue };

Adding a formatter based on an existing formatter can be done like:

template <>struct std::formatter<color> : std::formatter<const char*> { static constexpr const char* color_names[] = {"red", "green", "blue"}; auto format(color c, auto& ctx) const -> decltype(ctx.out()) { using base = formatter<const char*>; return base::format(color_names[static_cast<int>(c)], ctx); }};

Now all features of the const char* formatter are available in the colorformatter:

int main() { std::cout << std::format("{:#<10}\n{:+^10}\n{:->10}\n", color::red, color::green, color::blue);}

red#######++green+++------blue

More examples and details of the specification can be found in this{fmt} cheet sheet.

The status in LLVM 15

In LLVM 15 most of the basic text formatting is complete. All major papershave been implemented, but some defect reports are not implemented. The libc++team also wants to work on performance improvements and improvements to thecompile-time errors. Some of these improvements have already landed in mainand will be included in LLVM 16, while others are only planned.

Since the library is not entirely complete and the ABI may need to change (dueto planned improvements but also changes voted by the C++ Committee), it isshipped as an experimental feature in LLVM 15. To use the code in libc++ youneed to compile the code like:

clang -std=c++20 -stdlib=libc++ -fexperimental-library -ofoo foo.cpp

Format support for chrono is unavailable. Initial work has landed forLLVM 16, but none of it is available in LLVM 15. The chrono libraryitself lacks support for time zones, leap seconds, and some of the less commonclocks. These need to be available before the formatting support for chronocan be completed.

Formatting improvements in C++23

In the examples the output is first formatted in a std::string beforestreaming it to the output. To avoid the temporary std::string it ispossible to use std::format_to, but that doesn’t have an ergonomic syntax.In C++23 there will be std::print.

int main() { std::print("Hello {} in C++{}", "std::format", 20);}

Hello std::format in C++20

There is little support for formatting containers. In C++23 it will becomepossible to format ranges and containers.

int main() { std::print("{::*^5}", std::vector<int>{1, 2, 3});}

[**1**, **2**, **3**]

Some progress on formatting ranges has been made, but the main focus in theshort term will be to finish C++20’s format implementation.

Closing words

Starting with C++20, formatting text becomes a lot more pleasant and C++23 haseven more improvements lined up. This should provide long awaited functionalityin C++ and allow replacing several uses of <iostream> by a more convenient,faster and safer alternative.

Acknowledgements

Huge thanks to Victor Zverovich, the author of {fmt}. He has been heavilyinvolved in getting std::format in the C++ Standard. He has aided reviewinglibc++’s implementation and his insights and comments have improved the qualityof the implementation.

August 2022 LLVM relicensing update & further suggestions for help

Wed, 17 Aug 2022 00:00:00 +0000

The last update on LLVM relicensing was done about 8 monthsago. Since thenwe’ve made substantial progress, so I thought it’s worthwhile to share anotherupdate.

The TL;DR is:

Out of the about 32 million LOC that were contributed under the old license,we’ve reduced the lines of code that aren’t relicensed yet from 6% to only2%.
8 months ago, we were still searching for ways to contact 808 individuals whocontributed to LLVM over the past 20 years. We managed to reduce that numberto 421 individuals now.
We’ve also reduced the number of companies or universities to get a relicensingagreement from from 133 to 122.

Read on to find out how we achieved that great progress and how you can helpwith getting us closer to our end goal of having LLVM fully relicensed.

First of all, a big thank you to everyone who responded to the call for help inthe November 2021 blogpost and the 2021LLVM dev meeting presentation.This level of progress would not have been possible without your action!

Next to receiving more relicensing agreements, we also started exploring someof the tactics described in the previousupdateunder section “the end game”, for the pieces of code that we end up notreceiving a relicensing agreement for, as described in the next sections.

Threshold of originality

Remember that licenses exists because of copyright law – the license is howthe copyright owners of the code in LLVM give rights to users and othercontributors to use their code in lots of useful and interesting ways.

This also means that if a piece of code is not protected by copyright law,there is no need for it to be covered by a license.

Some pieces of code are not covered by copyright law. For example, copyrightlaw contains a concept called “Threshold oforiginality”. It meansthat a work needs to be “sufficiently original” for it to be considered to becovered by copyright. There could be a lot of different interpretations intowhat it means for a code contribution to be sufficiently original for it to becovered by copyright. A threshold that is often used in open source projectsthat use contributor license agreements(CLA) is toassume that any contribution that’s 10 lines of code or less does not meet thethreshold of originality and therefore copyright does not apply. In their May2022 boardmeeting,the LLVM Foundation decided to make the same assumption for the relicensingproject: contributions of 10 lines of code or less are assumed to not becovered by copyright.Therefore, we don’t need relicensing agreements for those.

Furthermore, there are a few commits that don’t seem to meet the“threshold-of-originality” even though they’re changing/adding more than 10lines. We also consider those to not needing a relicensing agreement. Oneexample is thiscommit, which onlyremoves the full stop at the end of a few sentences.

Code no longer present in Top-of-Trunk.

We started exploring which of the not-yet-relicensed code is still in thecurrent top-of-trunk code base. Some big contributions that weren’t covered yetare no longer there, such as for example the Microblaze and PIC16 backends. Wemanually checked and marked commits contributing to just these backends as notneeding to be covered by relicensing agreements.

To help with finding more code that is no longer in the code base, I wrote asimple heuristic script that searches in current top-of-trunk if lines from aspecific commit can still be found in the code base. If for a given commit, noor very few lines can be found in current top-of-trunk, that’s a strongindication that that code is probably no longer there. That heuristic scriptindicates that for about 5% of the not-yet-relicensed commits, the code does nolonger seem to be present. I still need to find time to manually verify thecode in these commits is indeed no longer in the code base. This manualverification would be something that someone could easily help me with. Ifanyone reading this would like to volunteer for helping with that - please dolet me know!

Next steps

In our quest to get nearer to 100% relicensing coverage, I believe thefollowing are the most impactful next steps to take:

Continue accepting more relicensing agreements, from individuals and fromcorporations. An up-to-date list of who we still need to get agreements fromis published as a spreadsheet here.
- We found that it can help a lot when corporations can get a list of whichcommits they’re agreeing to relicense. If you’re progressing getting acorporation/company to sign, please do send an email to[email protected]">[email protected] asking forthe list of commits that the company may own copyright on.
Go through the commits that look like they may no longer be in top-of-trunk,and verify that manually.

How can you help?

You could check if you know any individual still listed inthis up-to-date spreadsheet.If you do know any of the people, please do reach out directly to them - I’moften the bottleneck when you share contact details with me and rely on me toreach out to them. If they do have any questions, I’m more than happy to tryand answer them.
You could check if you know who the right people are to contact at any of theremaining companies/corporations and talk with them directly or share theircontact details with us at[email protected]">[email protected]. They are alsolisted in the samespreadsheet,on the sheet starting with “Corporations”.
If you’d be interested in helping to check if specific commits are stillpresent in the current code base, please do let me know at[email protected]">[email protected].

LLVM security group and 2021 security transparency report

Sat, 22 Jan 2022 00:00:00 +0000

Over the past few years, the LLVM project has seen the creation of a securitygroup, which aims to enable responsible disclosure and fixing ofsecurity-related issues affecting the LLVM project.

The LLVM security group was establishedon the 10th of July 2020 by the act of the initialcommit describingthe purpose of the group and the processes it follows. Many of the group’sprocesses were still not well-defined enough for the group to operate well.Over the course of 2021, the key processes were defined well enough to enablethe group to operate reasonably well:

We defined details on how to report security issues, see this commit on20th of May 2021.
We refined the nomination process for new group members, see thiscommit on 30th of July2021.
We wrote a first annual transparency report, which is published athttps://llvm.org/docs/SecurityTransparencyReports.htmlThere is a copy of the 2021 transparency report below.

Over the course of 2021, we had 2 people leave the LLVM Security group and 4people join.

In 2021, the security group received 13 issue reports that were made publiclyvisible before 31st of December 2021. The security group judged 2 of thesereports to be security issues:

Both issues were addressed with source changes: #5 in clangd/vscode-clangd, and#11 in llvm-project. No dedicated LLVM release was made for either.

We believe that with the publishing of the first annual transparency report,the security group now has implemented all necessary processes for the group tooperate as promised. The group’s processes can be improved further, and we doexpect further improvements to get implemented in 2022. Many of the potentialimprovements end up being discussed on the monthly public call on LLVM’ssecurity group.

New passes in clang-tidy to detect (some) Trojan Source

Wed, 12 Jan 2022 00:00:00 +0000

Trojan Source

The original Trojan Source paper encompasses afamily of attacks that rely on Unicode properties to make code look differentfrom how a compiler processes it. For instance the following code taken fromthe paper:

#include <stdio.h>#include <stdbool.h>int main() { bool isAdmin = false; /*â€® } â¦if (isAdmin)â© â¦begin admins only */ printf("You are an admin.\n"); /* end admins only â€® { â¦*/ return 0;}

looks like there is a guard on isAdmin while the compiler actually reads thefollowing byte stream

/* <U+0x202E> } <U+0x2066> if (isAdmin) <U+0x2069> <U+0x2066> begin admins only */

This issue got submitted before the official release to the LLVM Security group,and while we agreed this was more of a display issue than an actualcompiler-related issue, we also agreed having a clang-tidy check for eachflaws described in the paper could not hurt.

Using clang-tidy

The tool named clang-tidy can run a bunch of extra passes on a codebase,detecting coding convention issues, API misuses, security flaws etc. We havebeen adding three new checkers:

Detecting misleading bidirectional characters

The new check misc-misleading-bidirectional parses each comment and stringliteral from the codebase and looks for unterminated bidirectional sequence,i.e. sequence that leak past the end of comment or string literal, makingregular code being displayed right-to-left instead of the usual left-to-right.In the case of the example above we get a warning close to:

5:3: warning: comment contains misleading bidirectional Unicode characters [misc-misleading-bidirectional]

Detecting misleading identifiers

C++ allows for some Unicode codepoints within identifiers, including identifiersthat have a strong right-to-left direction, which can lead to misleadingstatements. For instance in the following,

int × = ×’;

Are we assigining to × or to ×’? We are actually doing the latter, andthat is confusing. The pass misc-misleading-identifier detect thatconfiguration and outputs a warning similar to

10:3: warning: identifier has right-to-left codepoints [misc-misleading-identifier]

Detecting confusing identifiers

Who never received a spam using unicode characters that look alike asciicharacters to bypass some hypothetical anti-spam scanning? C like language donot escape the trend, and it is perfeclty valid and confusing to define

int foo;

at some point of the program, and

int ðŸoo;

elsewhere. The misc-homoglyph checker detects such confusable identifiers(a.k.a. homoglyph) based on a list ofconfusables.txtmaintained by the Unicode consortium. In the case above, one would get a warningsimilar to

7:5: warning: ðŸoo is confusable with foo [misc-homoglyph]

Concluding Words

As described in this post, we chose to implement Trojan Source counter-measureas several clang-tidy checkers. Doing so instead of implementing them as Clang warningis a trade-off on parse time.

The interested reader can discover the alternative GCC aproach in thisdedicated blogpost!

Improving LLVM Infrastructure - Part 1: Mailing lists

Fri, 07 Jan 2022 00:00:00 +0000

When the LLVM Project was open sourced in 2003, it was a small project with a small community. The tools selected to support the project were chosen during a different time and a different situation.

Now, almost 18 years later, the project has grown tremendously and those infrastructure choices are not necessarily the right choices today. According to a recent article, the LLVM community experienced record growth in 2021 with 1400 contributors to the LLVM Project. This is incredible growth! While there is a very real cost to change, it is clear that we need to invest in modernizing these tools to support the current and further growth of LLVM.

Recently, the LLVM Project moved its source code repository and bug tracker to GitHub in an effort to grow the LLVM community and make contributing to the project easier. While these changes were neither quick or easy, the benefits to the project outweigh the time and efforts to make the move. The growth in contributors since the move to GitHub is just one metric showing the gains from the change.

This year, the LLVM Foundation, which provides the majority of the LLVM Project infrastructure, will be spearheading the effort to evaluate and identify improvements for the remaining tools and software the project utilizes. We will be looking at mailing lists, chat servers, code review tools, and other parts of our infrastructure. This is part 1 in our blog post series describing the upcoming changes (if any) to LLVM infrastructure.

Project Communication - Mailing Lists

Mailman and IRC have been the primary means of communication for the LLVM Project for the last 18 years. As the project has grown, we added mailing lists for subprojects like Clang, but the main list (llvm-dev) has remained unchanged. This list is meant to cover all of LLVM which includes everything from mid-level optimizations, to back-end optimizations, to the numerous LLVM backends. It has been the catchall for any topic not covered by another list. Due to the growth in LLVM and the size of the project, this list is very high traffic and requires a lot of work to filter messages that one may be interested in. It is increasingly clear that a mailing list may not be the best form of communication in the lack of features to make it easy to filter, respond and track topics of interest.

In 2019, LLVM started to experiment with Discourse as an alternative to its mailing lists. Discourse is an open source discussion platform built with an amazing list of modern features, such as the following:

Fully supported Email interface - Discourse supports the ability to interact through email if you do not like to use the web or app interfaces.
Categories and subcategories - This breaks the LLVM project into smaller pieces and users can subscribe to topics that they care most about versus having to do their own filtering.
Single Account Sign-On - Users can use their existing GitHub account to use Discourse and get access to all categories.
Dynamic notifications - Immediately notifies users when they are tagged, get a new reply, or sends email when you are offline.
Simple - Discourse uses a flat forum where replies are dynamically loaded and flow down the page in a line.
Mobile support - App available for iOS and Android, but you can also just use the web interface.
Better Moderation - The community has the ability to self moderate through flagging and spreads the work across a larger group of moderators.
Enhanced spam blocking - Out of the box spam filtering that eliminates obvious spam before it gets to moderators.
Safety - It has an enhanced trust system that allows the community to build natural immunity from trolls, bad actors, and spammers, and also reinforces positive behavior through likes and badges.

The majority of the community was in favor of the move when the move to Discourse was discussed extensively on the LLVM mailing lists. This provides the features mentioned above in addition to a more modern communication. We did hear of one feature some would miss compared to Mailman: the ability to reply to someone directly through email. However, while it may not be ideal for some, we feel this is a worthwhile tradeoff to gain the other benefits, e.g. better safety for LLVM developers and users in general.

Moving to Discourse

As a consequence of the above discussion, we are planning to move most mailing lists to Discourse. We are excited that it will bring many new features to help improve communication for existing contributors and users, and also make the project more accessible to newcomers as most are familiar with using forums such as Discourse.

However, while Discourse is a great solution for community discussions, it is not clear if it is a good place to host commit messages and post-commit review. Therefore, the commit lists wonâ€™t be part of the initial migration to Discourse - they will remain on mailing lists for now.

The plan to move to Discourse will involve the following:

January 7-9 - Re-configure the existing LLVM Discourse to the new category/subcategory structure (see below)
January 10-20 (sometime during these 2 weeks) - The LLVM mailing list archives are migrated to Discourse and it is sanity checked by volunteers of the LLVM community. This sanity check can take a week or more.
Feb 1 - The mailing lists (except commits) are put into read only mode and all users must migrate to use Discourse.
Feb 1-4 (approx) - The final merge is done from Mailman to Discourse. We encourage all LLVM community members to start using Discourse on Jan 10th to minimize any disruption once the mailing lists become read only and the final messages are merged to Discourse. However, please be assured that all mail will be merged to Discourse and you will be able to continue any threads started there once that is completed.

The mailman archives on the LLVM server may eventually be removed, but there is no final decision or deadline on this yet.

Mapping from Mailing Lists to Discourse Categories

The existing LLVM Discourse will be modified to have the following categories/subcategories:

Announcements
Clang Frontend
- Using Clang
- Building Clang
- Static Analyzer
- clangd
Subprojects
- LLD Linker
- Flang Fortran Frontend
- LLDB Debugger
Code Generation
- Common Infrastructure
- AArch64
- ARM
- Mips
- PowerPC
- RISCV
- WebAssembly
- X86
- Other
IR & Optimizations
- Loop Optimizations
Community
- Women in Compilers and Tools
- Job Postings
- US Developer Meeting
- EuroLLVM
- Google Summer of Code
- Community.o
- LLVM Foundation
Project Infrastructure
- Release Testers
- Website
- Documentation
- GitHub
- Code Review
- Discord
- Mailing Lists and Forums
- IRC
- Infrastructure Working Group
- Policies
- LLVM Dev List Archives
Runtimes
- C
- C++
- OpenCL
- OpenMP
- Sanitizers
Incubator
- CIRCT
- mlir-npcomp
MLIR
- Announcements
- Newsletter
- TCP-WG

The Mailman archives will be mapped as follows:

Mailing lists	category in Discourse
All-commits	no migration at the moment
Bugs-admin	no migration at the moment
cfe-commits	no migration at the moment
cfe-dev	Clang Frontend
cfe-users	Clang Frontend/Using Clang
clangd-dev	Clang Frontend/clangd
devmtg-organizers	Obsolete
Docs	Obsolete
eurollvm-organizers	Obsolete
flang-commits	no migration at the moment
flang-dev	Subprojects/Flang Fortran Frontend
gsoc	Obsolete
libc-commits	no migration at the moment
libc-dev	Runtimes/C
Libclc-dev	Runtimes/OpenCL
libcxx-bugs	no migration at the moment
libcxx-commits	no migration at the moment
libcxx-dev	Runtimes/C++
lldb-commits	no migration at the moment
lldb-dev	Subprojects/lldb
llvm-admin	no migration at the moment
llvm-announce	Announce
llvm-branch-commits	no migration at the moment
llvm-bugs	no migration at the moment
llvm-commits	no migration at the moment
llvm-dev	Project Infrastructure/LLVM Dev List Archives
llvm-devmeeting	Community/US Developer Meeting
llvm-foundation	Community/LLVM Foundation
Mlir-commits	no migration at the moment
Openmp-commits	no migration at the moment
Openmp-dev	Runtimes/OpenMP
Parallel_libs-commits	no migration at the moment
Parallel_libs-dev	Runtimes/C++
Release-testers	Project Infrastructure/Release Testers
Test-list	Obsolete
vmkit-commits	Obsolete
WiCT	Community/Women in Compilers and Tools
www-scripts	Obsolete

For help with migrating to Discourse, please see the User Guide and discuss on the LLVM Discourse forums.

LLVM relicensing update & call for help

Thu, 18 Nov 2021 00:00:00 +0000

In this blog post, I’d like to summarize the main points I talked about in therelicensing update presentation at the 2021 LLVM Developer’s meeting.

The very short summary is that we are currently in the long tail phase ofcollecting relicensing agreements of past contributors. We already at the timeof this writing have more than 94% of older code relicensed. We hope to crowdsource getting through the long tail to get as close to 100% as possible.

Call for help

You can help by looking through the list of remaining individuals andcorporations we need to get agreementsfrom,and reaching out to them, or letting us know at [email protected]">[email protected] howwe can reach out to them. Also, if you’d happen to have any other information that you think could help us, please do let us know at [email protected]">[email protected].

More detailed guidance on how you can help are available athttps://foundation.llvm.org/docs/relicensing_long_tail.

In the rest of this blog post, I’m going to give a bit more historicalbackground and describe the current status in more detail.

The LLVM relicensing effort: phases over time

The relicensing effort started some time ago. Let me first very briefly describethe various phases before going in more detail. In the years leading up to 2015,it became clear that there were some issues with the license LLVM had at thetime.

A different license was the answer to those issues. Between roughly 2015 and2017 the focus was on deciding what different license would be best suited.

Once it was decided what the new license was going to be, we started working ongetting all code to be covered by this new license. That includes gettingagreement from all copyright holders of existing code to share their pastcontributions under the new license. As you can see on the timeline, gettingthose agreements is the current focus of the relicensing effort.

Maybe we won’t be able to get an agreement for every single past contribution.In that case, we have a number of options to get to the point where we can claimthat all code in the LLVM project is covered by the new LLVM license. We callthis phase of relicensing “the end game”.

Why relicense?

The old license consists of 3 components: the UIUC license that covered all thecode, the MIT license that additionally covered the code in run-time libraries,and a few sentences on granting patent rights in the developer policy.

This causedthe following 3 issues:

Some were blocked from contributing because of the text in the patent sectionin the developer policy. The wording could be interpreted as requiring givingunnecessarily broad access to patent rights when contributing to LLVM. Thatmade it impossible for some companies to contribute.
The run time libraries were dual licensed under the UIUC and MIT license; therest of the code only under the UIUC license. Therefore, we could not easilymove code to run time libraries from other parts. The reason run timelibraries were dual licensed was to enable linking to run time librarybinaries without requiring attribution to LLVM.
The wording on patent rights in the developer policy was fuzzy and imprecise,leading to uncertainty over whether it did provide the intended protection.

A new license

After exploring a range of options, it was decided that the best solution to solve these issues was to have all code licensed under the Apache-2.0 with LLVM exception license:

Apache-2.0 contains well-understood patent granting which addresses the firstand third issue.
The LLVM exception is there for 2 reasons:
- It removes a potential incompatibility with using LLVM code in combinationwith GPLv2 code.
- It removes the requirement for developers using LLVM tools to tell the usersof the binaries they produce that those binaries may contain some codeoriginating from LLVM. Such a situation can easily arise when parts of theLLVM run-time libraries are linked in as part of the normal compilationprocess.
  The LLVM exception enables the run-time libraries to be covered by the exactsame single license as the rest of the code base.

Getting all code covered by the new license

After a decision was made of what the new license should be, we started workingon having all code covered by it.

As a first step, we made sure that all new contributions were covered by the newlicense. This happened after the 8.0 release branch was created. The 100kcommits since are covered by the new license.

The remaining task is to also get the earlier contributions covered by the newlicense. This consists of about 300k commits totaling about 32 million lines ofcontributed code.

What needs to be done to get those earlier contributions covered?

The reason we need a license in the first place is copyright. Most codecontributions are covered by copyright. Which means that the person or companyowning the copyright has a lot of decision power over what can legally be donewith that code. By covering the code with a license, it becomes clear whatothers are permitted to do with that code. If there isn’t a license associatedwith copyrighted code there isn’t all that much useful that others can do withit.

Basically, to get existing code to be covered by a new license, we need to findwho owns the copyright on it, and ask them to agree with offering theircopyrighted work under the new license.

The copyright owner can be either an individual, for example the person whowrote the code originally; or a company, for example a company that employed theperson who wrote the code.

Asking agreement from copyright owners

We started asking for their agreements. By examining the version control log ofthe 300k-ish commits we need to get agreements for, we found that about 2800different people or email addresses made a contribution.

We reached out to all of them and asked them two things. First, if anycorporation may own the copyright on any of their contributions. Secondly, ifthey agree with relicensing the contributions they copyright own personally.

So far, we’ve heard of about 220 unique corporations potentially copyrightowning some past contribution. We also started reaching out to those, but havenot asked every single one just yet.

Status as of November 2021

So after 2.5 years since we started asking – where are we with gettingagreements to relicense?

The chart below summarize the current status. It shows a treemap where eachrectangle represents the contributions made by one person. The size of eachrectangle represents how many lines of code the person contributed.

When the rectangle is green, it means all their contributions are fully coveredby relicensing agreements.

When the rectangle is orange, it means we have not yet received such anagreement. When the rectangle is orange with green stars, it means somecontributions by that person are covered and others not. This can happen forexample when the person has worked for multiple companies over time and onlysome of those companies have agreed with the relicensing so far.

We already have over 94% of all contributed lines of code between 2001 and 2019covered by a licensing agreement. We only have a good 5% of lines of code to gostill.

As you can see, most of the missing agreements are with “long tail”contributors. By long tail contributors here we mean the many contributors whocontributed relatively fewer lines of code. We focussed on reaching out to thebigger contributors first. To reach out well to the long tail of contributors,we’re hoping to get help from the wider LLVM community.

Help wanted!

Please do consider helping us with reaching any of the people or corporations inthe long tail. Please have a look at the up-to-date list of people andcorporationswe can use help with getting in touch with.

You can find more detailed guidance on how you can help on the LLVM foundationwebsite’s relicensing long tailpage.

If you do think you could help us with reaching out to someone on the list, oryou may have some other information that could help us, please do let us know byemailing [email protected]">[email protected].

Relicensing end game

We are currently in the phase of getting as many relicensing agreements aspossible. We do expect that we may not be able to get absolutely 100% of allpast contributions covered by an agreement. What can we do to achieve currenttop-of-mainline to be fully covered by the new license?

We will need to decide on a contribution-by-contribution basis what the optionsavailable are to achieve that goal. We have at least the following options.

We can check if copyright even applies to the particular contribution. Verysmall contributions may not be covered by copyright, and hence may not need alicense agreement.
It may well be that code contributed a long time ago is no longer in the codebase.
If copyright does apply and the code is still in the code base, we can removethe contribution. Depending on whether current contributors and users stillvalue the effect of that contribution, it may need to be reimplemented.

apt.llvm.org - moving from physical server to the cloud

Tue, 02 Nov 2021 00:00:00 +0000

In this blog post, I would like to explain how I migratedapt.llvm.org from physical hardware hosted ina datacenter to the Google cloud. This has resulted in better securityand faster builds for LLVM Debian/Ubuntu nightly builds.

Previous infrastructure

About 10 years ago, Nick Lewycky from Google offered to replace my mixof old and terrible servers with 16 blade servers in a chassis and acontrol server. Below are pictures of the front and back of theseservers, which have served us faithfully for 10 years:

The hosting was sponsored by IRILL, atInria, next to Versailles.

This setup was running using a PXE install with preseed to set upeverything on the 16 blades.

While it served us very well for years, it is now holding us back.

Some of the blades are dying;
Old hardware - it takes from 4 to 7 hours per build;
All blades are always on even if no build is executed;
Even if it happened rarely, I had to drive to the DC to fix somesystems.

New infrastructure

As part of the OpenSSF initiative, Google andthe Linux foundation offered a budget to migrate this service on GoogleCloud Platform (GCP).

As Jenkins has been working very well, I reused this approach andimported the previous configurations (available onhttps://github.com/opencollab/llvm-jenkins.debian.net/tree/master/jobs).

I followed most of the tutorial provided on GCP, Using Jenkins fordistributed builds on ComputeEngine.

The architecture is the following:

Similar to puppet or chef, Salt is the configuration management toolused to configure the various systems. A salt server configures both theJenkins controller and the reference node. This node is a preconfiguredVM with all the dependencies and tools to start the build. This node ishalted most of the time and only started when the image needs to beupdated.

For example, the list of Virtual Machines now usually looks like thefollowing screenshot:

5 build VMs - off after 5 minutes without activity
The template VM (off as it is only started to update the image)
The jenkins orchestrator

Build servers

The controller is always on and starts/ends the build nodes.

The reference node is configured and updated using this script, based ongcloud, the Google cloud CLI:

https://github.com/opencollab/llvm-jenkins.debian.net/blob/master/image-update.sh

This script performs the following steps:

starts the reference node and then ssh into it.
Perform the needed changes (ex: new chroot for a new distro version,new build hooks, etc).
Once the changes are performed.

Usually:

Salt update
Refresh the chroots
git pull llvm-project
Stops the VM
Creates a new image.
Archives the old image.
This image will be used by Jenkins to start new VMs.

The node image contains the various i386 and amd64 chroot for supportedDebian and Ubuntu installs. The LLVM repository is also checked out toavoid doing a long clone with many files on every VM startup.

Thanks to this, the jobs just need to perform a small git pull insteadof a full clone.

The size of this image is 6.67gb. This approach saves about 10 minutesof the build pipeline (skipping the creation of the chroot + fullclone).

This wasn’t easy. I had to iterate 25 times to have a proper imagesupporting all the use cases.

GCP provides a shared file system mechanism available as an NFS mountagecalled Filestore. Each buildnode has access to it and will update the various repositories just likeif it was a local file system.

Benefits and cost

The first benefit is security. This is based on the Google cloudinfrastructure, so it benefits from all their security work.

The fact that the VMs are always cycled and recreated from a clean imagewill also help. They are not directly accessible from the Internet (theyleverage cloud NAT to be able to download packages from the Internet),so it will be harder to attack.

An amd64 build now runs in parallel takes **about 50 minutes **insteadof almost 7 hours.

In terms of cost, depending on the activity, the daily cost varies from70 to 150 US$/day.

Now that the migration has been completed, it is now possible toconsider some builds leveraging PGO and LTO for faster binaries.

Thanks

Thanks to Laurent Vaylet from the GCP team for the help and patienceanswering my newbie questions, and also thanks to Google and the LinuxFoundation for the GCP credits and support.

Generating relocatable code for ARM processors

Fri, 01 Oct 2021 00:00:00 +0000

Abstract

By upgrading the LLVM compiler, we solved the problem when neither LLVM nor the GCC could create the correct Position Independent Code for Cortex M controllers, with the code running in Flash memory rather than in RAM. Now the binary image of the program can be flashed to an arbitrary address and run from it, without being moved to another place.

Updating the microcontrollerâ€™s â€œfirmwareâ€ is a dangerous process. Before, any hardware failure during the update could brick your device. Nowadays, devices usually have their own boot loader which lets you restart the update before your deviceâ€™s functionality is lost. Until the update is not complete the device will not work. The fanciest way to update is to use two separate regions for the â€œfirmwareâ€ â€“ the main region and the reserve region. In the picture below, they are marked by red and blue accordingly. Initially, the red region is active, while the update will be loaded into the blue region. This way boot failure is not a big deal. The red region will still be running everything. And if the update is successful, the blue region will become active. The next update will be loaded into the red region, and so on. Each update will lead to this switch.

Unfortunately, with Cortex M systems such an approach cannot be used directly. The program is tied to absolute addresses and cannot be run in an arbitrary place. This article explains why that is the case and how we made the program relocatable by modifying the LLVM compiler.

Introduction

Someone who has read the documentation may say that compilers already have options to create relocatable code. In the case of LLVM it is -fropi, -frwpi and others. That is where we started our tests too. It turned out that these options were very convenient for the systems where the program was loaded entirely into RAM. In this case, both constants (which were located in the code) and variables (which were located in the data section) were located in the same big segment. This is why it could be easily moved within the RAM.

For Cortex M controllers it was not that easy. Their code was located in a high-volume ROM while the data was stored in a low-volume RAM. In addition, those entities were located in different parts of the Flash/RAM.

Use of these options led to either the data being shifted by the same offset as the programâ€¦ But RAM was much smaller than ROM! Thus, data went out of the allowed bounds.

Or to a huge table of pointers to the code constants being created in RAM. The size of this table significantly raised the amount of used RAM. Therefore, sometimes the amount of RAM provided by Cortex M controllers was just not enough.

It became clear that for the target platform (Cortex M controllers) we had to make changes to the compiler.

Basic Principles

To keep things coherent, letâ€™s start with the basics. In computers, data and code are located in the global memory. In the given architecture (just like with almost all other architectures) a linear address space is used, so the location of an object can be defined by a number. In order to perform a certain operation with a memory cell, the processor must know the address of this cell.

This is where things became complicated. Both maximum instruction and address sizes are 32 bits. Therefore, you could not just emplace 32-bit address into instruction â€“ it had to have enough space for the instructionâ€™s code. One of the solutions was to use relative addressing. At runtime, the program counter register contained the address of the current instruction. That was a full-size 32-bit register the contents of which was being controlled by the hardware. By reading it the program could know its own location in the Flash. There was enough space within the instruction for some offset, so a significant range of addresses around the current location became available. For example, if you needed to call a function that was located near the currently running function, the processor performed a jump with one instruction.

Here, function MyFunc is located near the call location:

 bl MyFunc ... .type MyFunc,%functionMyFunc: ...

However, it was only a partial solution, because not all jumps were relative, some were absolute. There was a solution though. We put the absolute address into the Flash directly within the code, somewhere near the place where it was being used. Then in a similar fashion, we performed a PC-relative load of this address into a register and loaded the value itself relative to the register with the next command. A more advanced method was to use a movw, movt instruction pair. The target 32-bit address was cut into two halves which was loaded into the register in two steps. Although it meant using two commands, it led to saving extra addressing.

Loading the contents of global_result_data7 address into the r0 register:

 movwr0, :lower16:global_result_data7 movtr0, :upper16:global_result_data7 ldrr0, [r0]

Letâ€™s take a closer look at the register loading process, where the variable global_result_data7 had the address 0x12345678:

Now, when we figured out how the processor operated with the addresses, it is time to explain where they come from. The Flash/RAM is (with some exceptions) uniform, which means that the function with the address 0x1000 may as well be located at the 0x2000, the only requirement is for the address to be known. The linker is responsible for address assignment. It receives all objects of the program as input and allocates space for them according to the attached configuration file. This way, every object gets its own address, and these addresses are written down to where the functions are being used. Moreover, it is worth noting that Flash/RAM does not have to start from the zero address, the starting address 0x08000000 is quite common as well. This value must be added to all global addresses.

Problem Definition

So, we had a binary image, suitable for loading into the deviceâ€™s Flash/RAM. When we wrote it from the initial address defined at linking time and then ran it, the program worked. All data was located in the expected places, all the called addresses had the necessary functions, and so on. One may wonder, what if we had loaded the image from a different address? The first thing that comes to mind â€“ everything would have fallen apart. The program would have accessed the cell value from the address 0x1000, but it would be actually located in a completely different place. However, letâ€™s imagine a basic program consisting of only one instruction â€“ infinite loop. Obviously, such a program was quite relocatable: and because such a short jump was performed PC-relatively, it automatically â€œadjustedâ€ to its new location. Moreover, if all jumps/calls in the program were relative, it could be quite big and complex; at least all functions were called correctly. Everything was fine until the program tried to use a global address with a value that was offset relative to the expectations.

Now it is time to ask a question: was it an issue? Whatâ€™s wrong with the program being tied to absolute addresses? Whatâ€™s the benefit of a program that can run from an arbitrary address? One can argue that it is a good thing if it is cost-effective. Still it could have a much more specific use. For example, the device could receive a code snippet from the outside, which would expand the features of the already existing program, and load it as an addition to the current code. It would be very convenient if this code could work regardless of its location in the Flash. Lastly, there would be a possibility of a complete firmware update release. It also had to be located somewhere in the Flash and then given control of the device. Notably, we do not know where it ends up, so the ability to run from arbitrary addresses is a necessity.

Therefore, the task of achieving relocatable code is worth looking into. It should be noted that in serious systems it is solved by using virtual memory. Logical addresses used in the program are silently mapped to physical addresses, so there are no problems. However, it is a completely different technological level. We were focused on the Cortex-M, without the MMU module. This is why in our case we had to â€œadjustâ€ the global addresses by adding a programâ€™s memory offset value. Because all address calculations came down to offsets and the difference of pointers, no other changes were required.

It brought up a new task â€“ to retrieve the programâ€™s offset value in the Flash relative to its initial address that was defined at linking time. For example, one could use the following trick. Global addresses remained the same, while the PC value was different. You got the difference between a global address and a PC one during run from a â€œnormal addressâ€ and â€œhardcodedâ€ it into the program. When run from an offset address this difference was different, and calculating how much it had changed gave you the offset value. However, as it will be shown later, there is a more direct way to get the offset value, so for now, letâ€™s just say that the offset value is known.

Implementation (Initial Approach)

So, the processor loaded the global address into the register. When you use assembler, you only have to insert a specific instruction right after it, which will add the offset value to the result.

DoOperation function receives the global address global_operation5, which is modified by a r9 register value:

movwr0, :lower16:global_operation5movtr0, :upper16:global_operation5addr0, r9blDoOperation

To do this, we had to reserve the register for permanent storage of this value. Certainly, we could load it from RAM each time, but expected losses (ROM volume and processor cycles) were higher than if we were to lose one register. But what should we do if we coded in C? Obviously, the whole scheme only made sense when we did not have to modify the source code. Itâ€™s one thing to make some special manipulations here and there, but no major changes are allowed. Fortunately, the compiler could deal with the given task with only a minor modification.

The thing is, the compiler knows perfectly well what exactly it is doing. We used LLVM because we had extensive experience in modifying this compiler, so going forward we will discuss LLVM only. LLVM has a mechanism for separate address space which allows us to bind an attribute to a pointer, which defines data location. In our case, we assumed that only ROM content was moved, while RAM addresses stayed the same. In this case, we set a separate address space for constant global objects (functions, constant data, string literals â€“ everything that went into the read-only memory).

Here we set a separate address space for constant data and map it to global objects:

llvm::Optional<LangAS> ARMTargetInfo::getConstantAddressSpace() const { return getLangASFromTargetAS(REL_ROM_AS);}LangAS getGlobalVarAddressSpace(CodeGenModule &CGM, const VarDecl *D) const override { if (D && CGM.isTypeConstant(D->getType(), false)) { auto ConstAS = CGM.getTarget().getConstantAddressSpace(); assert(ConstAS && "No const AddressSpace"); return ConstAS.getValue(); } return TargetCodeGenInfo::getGlobalVarAddressSpace(CGM, D);}

This attribute â€œlivedâ€ inside the type during the whole compilation; during object address loading there was a way to recognize the ROM allocation and insert the instructions necessary for the offset.

Static Initialization Issue

Unfortunately, there was a scenario that we could not process with the given method. We had a global pointer array, which was initialized with a mix of ROM and RAM addresses.

Array arr contains two addresses: for ROM and for RAM:

 int ro; const int rw; const int *arr[] = { &ro, &rw };

This initialization was performed statically, which meant that the linker allocated Flash and filled it with numbers, which were represented as symbolic addresses.

Static initialization of array arr:

.typearr,%object.section.rodata,"a",%progbitsarr:.long ro.long rw.size arr, 8

It did not know the future offset value yet, so it could not do anything. It shall be noted that linkers come in all shapes and sized, they can have various advanced features, but for that time, we stuck to simple binary images. So, we had an array of numbers. Still we could do nothing at runtime as well, because we did not know which addresses were from ROM and which were from RAM anymore. So, did our whole idea just go down the drain?

Essentially, there was a very costly but quite versatile solution. If we had known the RAM and ROM address ranges, we could have sent the whole addressing through a special function that, based on the address value, defined where it was from and modified it if necessary. However, the overhead expenses it required were enormous. In this case, the solution was purely theoretical, and obviously, unusable for real-world applications, apart from some individual cases.

Implementation (New Approach)

We wished we had modified the addresses during the firmware loading. Of course, it required us to change the program for loading the image into the Flash, as well as to provide additional information about the image itself, but that idea actually sounded plausible. As was said above, the ability to run from any loading address came at a cost of a small increase in code size and slightly lower performance. If we had fixed all bottlenecks directly in the binary image, the relocatability could have been achieved without the aforementioned performance/efficiency losses. There were possible pitfalls, but the idea was worth trying out.

The first method was rough though a promising one. We wondered what if we had made two firmware variants be loaded from different addresses and compared afterwards. Only the addresses should have changed, so we would have seen all the places that needed modification. However, it turned out that there were many more differences than that. Some of them might have been disregarded as irrelevant to our task, but there were no guarantees that we would always be able to distinguish valid issues from artifacts. And in the end, the whole method itself sounded too naive for serious applications.

Taking into consideration the above, it was obvious that we had to change at least some addresses used for global pointers initialization. For simplicity, we supposed that during code generation an intermediate assembler file was created and that we had the ability to intervene with this process. A new idea emerged then. Each time the compiler used a global address for initialization, we could see which address space it used, and if it was from ROM, we could put a label before it.

We put markers before ROM address initialization.

mainMenu:Reloc2: .long mainMenuEntriesC .long mainMenuEntriesMReloc3: .long mainMenuEntriesC+8

At the end of the module, we added a section with a special name and put all those labels in it.

Labels are put into the reloc_patch section:

.section .reloc_patch,"aw",%progbits.globl _Reloc_3877008883__Reloc_3877008883_: .long Reloc1 .long Reloc2 .long Reloc3

In the linker script, we defined this section as KEEP so it would remain intact, because, obviously, there were no uses of its data. Later, when the executable file was being linked, all the added sections were joined together and the labels got specific values corresponding to the binary image addresses. The key moment here is that those addresses were equal to the offsets in the output file. Therefore, we could locate the places that needed changes. One subtle aspect to be noted, if the initialized data was located in RAM, their initialization would be located in ROM with a known offset. Therefore, we needed two sections: one for ROM and one for RAM data. The first was processed as listed above, while for the addresses of the second section we had to subtract the initial RAM address and add the initialization block offset which was defined in the linker script file.

Then we wrote a small utility program that recovered our sections from the ELF representation and got the offset list for the global addresses.

It shall be noted that there is a simple way to get relocation tables using standard means, i.e. a linker creates sections in ELF when â€“q opcode is set. These tables contain all the data we retrieved using the described above method and can be used for our purposes as well. However, they are too big, while we aimed at less memory usage. Moreover, we would use only small bits of them; therefore, we would have to deal with a table relocation parsing issue. On the on hand, we would save efforts by leaving compiler as it is, on the other, it would bring up another problem. Thus, we decided in favor of the method described above.

Development of the Approach

That was enough to modify the firmware at load time but we went even further. We defined a simple set of commands, like this one:

â€™Dâ€™ [qty] {data} â€“ write the following qty bytes from the input stream

â€™Aâ€™ [qty] â€“ interpret qty as addresses, add a specific value to them and print them to the output

Send four bytes to the output stream and correct two addresses:

â€™Dâ€™ 0x4 0x62 0x54 0x86 0x12 â€™Aâ€™ 0x2 0x00001000 0x00001008

If the offset is 0x4000, the result would be like this (for clarity we do not decompose the numbers into bytes):

0x62 0x54 0x86 0x12 0x00005000 0x00005008

Then we turned the initial binary image into a stream of such commands. All data up until the first address that needed modification were skipped without changes, then followed the command that modified one or a couple of addresses, then some data again, and so on, until the end of the file. That way, the information for address correction was embedded into the binary image. On one hand, the resulting file was no longer firmware suitable for booting flashing. But then we were able to process it â€œon the flyâ€ in streaming mode as we received it, using a small buffer. Thus, we had to modify the loader program a little to change the addresses according to the received commands rather than just writing the input stream into the board Flash/RAM. Moreover, we created an additional utility program that accepted such a file and a starting address as input and then created a firmware.

We went even further. As mentioned above, some addresses were â€œhardcodedâ€ into the movw, movt instruction pairs. The compiler could tell which of them corresponded to ROM address loading, put labels there and make another section for them. Also, we added a command that chose two words from the stream, interpreted them as loading instruction pairs, retrieved the address, changed it, and then put it back in. Thus, the additional steps during runtime became unnecessary. Apart from that, we got the ability to modify the program at ease (for example, changing the version number, etc.).

This also allowed us to give the program the offset value if necessary. To do that, we created a function with a special name that wrote the constant value to a RAM cell. In code this function started with two movw, movt pairs â€“ the first was to load RAM cell address and the second was for the constant itself.

Retrieving the offset in the RAM cell and r9 register:

int rel_code = 0;int set_rel_code() { rel_code = 0x12345678; return rel_code;}void __attribute__((section(".text_init"))) Reset_Handler(void) { set_rel_code(); asm(â€œmov r9, r0â€); ...}

We added another stream command that had not been added the offset to the value hardcoded in the load instruction pair, but it actually changed the value to the one being given. As a result, the function put the offset value itself into RAM, and after the functionâ€™s call this value was available. All in all, that opened quite a broad range of possibilities, so the additional difficulties look justified.

Current Limitations

Undoubtedly, the need to work with the intermediate assembler file is a flaw of the implementation. It can be neglected because it is hard to say what it exactly affects. Perhaps, we would get rid of it in the future versions by creating labels directly within the compilerâ€™s internal representation. There is also nothing good in that service sections used for retrieving displacements of the binary image, actually take up memory space. However, we could give them fictional addresses where the Flash/RAM is absent as long as there are no errors during loading.

Conclusion

The LLVM compiler was modified to generate binary code that could be tied to any address before loading to the flash memory without using the additional source code or development environment. All the necessary information is contained in the binary code.

Meet the LLVM Outreachy Interns!

Wed, 14 Jul 2021 00:00:00 +0000

The LLVM Project is participating in the Outreachy program for the first time. Two interns have been selected: Sushma Unnibhavi & Pooja Yadav.

Outreachy provides paid, remote internships with the goal of increasing diversity in open source. Outreachy interns work with mentors from open source communites on projects in programming, user experience, documentation, illustration, graphical design, data science, project marketing, user advocacy, or community event planning.

Pooja will be working with her mentor Kit Barton on her project to Create Documentation and Tutorials for the LLVM Global Instruction Selection Framework. Sushma will be working with her mentor Anshil Gandhi on her project to Implement GlobalISel for the M68k backend in LLVM. T

Thank you to the sponsors of the LLVM Foundation as their support has made these internships possible. In addition, we would like to give a huge thank you to Kit and Anshil for mentoring!

We asked Sushma and Pooja a few questions about themselves and the project they are working on. Here are their answers:

Sushma Unnibhavi

Can you tell us about yourself and your background?

I am Sushma, a final year Information Science Engineering undergradfrom India. My hobbies include drawing, dancing and reading. I lovewriting code. I have been obsessed with the idea of using software tosolve practical problems. I love to dig into problems and solve themwith modern technology. I am constantly learning because I neversettle. I focus on making high-quality decisions and love meeting newpeople and hearing new perspectives. My specialities include learningnew skills and programming languages and problem-solving.

How did you hear about Outreachy and why did you apply?

I first came to know about Outreachy from my brother and searched forit right away. I found out that Outreachy seeks interns who aretalented and have a zeal to learn while most of the other internshipssought for the experience. I have always wanted to contribute to opensource but never got proper guidance. Then I read the experiences ofprevious Outreachy interns which really motivated me. Many of themdidnâ€™t have any prior experience and they were supported so much fromthe mentors that made them reach greater heights. This made methinkâ€¦..If they can, why canâ€™t I? Hence I decided to apply toOutreachy. The way I have seen myself grow by contributing to LLVM inthe one month time span of the contribution period is reallysurprising.

Which project will you be working on?

I will be implementing GloballSel for M68k. I will be adding minimalsupport necessary to select a function that returns sum of two i32values. This includes some very naive and hardcoded argument/returnlowering as well as handling of copy and add instructions throughoutglobalisel pipeline. I will also be implementing lowering foroperations including add, sub, mul, div, comp, phi, load and store.

What are you most looking forward to during your internship?

One thing I am looking forward to in this internship is a greatlearning experience. When I first started contributing to LLVM, Ithought I wouldnâ€™t stand a chance because I had never worked on aproject with such a huge codebase and that too working on compilerswas a nightmare for me. Through this internship I want to let go ofthis fear and become more confident. I want to prove to myself thatwith hard work and consistency nothing is impossible.

Pooja Yadav

Can you tell us about yourself and your background?

Hi! I am Pooja Yadav from India. I am currently pursuing my undergraduate degree in Computer Science and Engineering at National Institute of Technology, Goa, India. I have been interested in computer science since high school and feel lucky to get the branch of my choice in one of the deemed universities in India for my Btech degree. I like badminton and skipping in sports and I was a national player in rope skipping at my high school. Sometimes, to relax myself I draw sketches and play with paints sometimes. It is really therapeutic.

I am also a supporter of equity and diversity in the tech field. I am the General Secretory of SPIE student chapter of our college and we have organised many events to support equity. diversity and inclusion.

How did you hear about Outreachy and why did you apply?

I heard about Outreachy in my second year of college from one of my seniors who is also an Outreachy alum. So, I also thought about participating in it. Outreachy provides great opportunity to people subjected to systematic bias and underrepresented in the technical industry. This was what encouraged me to participate in it. I got introduced to the vast ocean of computer science field in my first year of college and It seemed so overwhelming to me. So, I thought Outreachy would be a good place to start and that is how I got introduced to open source. I tried for Outreachy internship last year also but my final application was not selected. However, I learned a lot of things and was determined to try this year as well. I met many amazing people during the contribution period. All mentors and participants were very friendly and it was always a pleasure to talk to them. Mentors were very patient to answer all my questions, even the dumbest one. Also, the projects were so exciting and I wanted to dive into one of them because I knew that would raise my learning curve to another level. And this year the LLVM project on which I would be working got my interest. So, I decided to contribute to it and luckily I got selected for it.

Which project will you be working on?

The project on which I am working is ‘Create Documentation and Tutorials for the LLVM Global Instruction Selection Framework’. Working on it was quite adventurous during my contribution period and currently I am learning a lot of things. This project is about reading present documentation of GlobalISel and updating or making the required changes. Then make a good tutorial for GIobalISel which would help many contributors in future; beginners as well as professionals. For creating documentation and tutorials I have to understand everything about GlobalIsel. So I am learning along as I move ahead in my project.

What are you most looking forward to during your internship?

I had never heard about LLVM IR before. When I first read about it I was intrigued by the idea of improving the performance of a compiler with one more intermediate representation i.e machine IR and how we can write a generalised IR for compilers so that we don’t need to do it from scratch for every backend. I have learnt many things while working on this project, which include other things apart from this project topic like the concept of compilers, git, open source contributing rules, interaction with other contributors, communication skills, etc.

I am looking forward to learning more about this project and contributing my best to it. I like to explore different fields and this is a great opportunity to observe what other contributors are doing and learn from their project as well.

Smaller debug info with constructor type homing

Mon, 05 Apr 2021 00:00:00 +0000

Constructor type homing for debug info

Background

Class type information is a large contributor to debug info size. Clang already has a few optimizations to reduce the size of class type information based on the assumption that debug info can be spread out over multiple compilation units. So, instead of emitting class type info in every compilation unit that references a class, we only really have to emit it in one place. (For all the other references, emitting the much smaller forward declaration of the class is sufficient.) As an example, one of the existing optimizations is vtable homing, where the type info for a dynamic C++ class is only emitted when its vtable is emitted.

Constructor type homing

Constructor homing is a similar optimization that applies to almost all classes with constructors. It emits type information for a class wherever its constructor definition is emitted. Unlike with vtable homing, the type info for a class could be emitted more than once, but it has a large impact on debug info because it applies to a large percentage of classes. If all of a class’s constructors are defined out of line, the class type information will only be emitted once. If there are constructors defined inline, the inline constructors, and therefore class type information, will be emitted in every compilation unit that calls a constructor.

Constructor homing assumes that if a user wants a class to have debug info, then that class was constructed somewhere in the program. This is a reasonable assumption to make, as classes viewed in the debugger probably exist in program memory, and any class that exists in memory must have been constructed.

Even though all classes have constructors, there are some types of classes that constructor homing doesn’t apply to: trivial classes, aggregate classes, and classes with constexpr constructors. It’s possible to create instances of these classes without emitting a constructor, so we can’t guarantee that the debug info will be emitted. However, these types of classes tend to be fairly small, so we would probably see less of an improvement from using constructor homing on them.

Constructor homing can be enabled with -Xclang -fuse-ctor-homing. Eventually, the plan is to enable it by default in Clang so that it happens as part of -fno-standalone-debug. In terms of Clang’s -debug-info-kind= flags, constructor homing is implemented as -debug-info-kind=constructor, one level below -debug-info-kind=limited.

Size improvements

Emitting less class type info gives us a significant reduction in object file sizes. In a Chrome debug build on Linux (which uses split dwarf for debug info), .o and .dwo file sizes with constructor type homing are about 30% smaller (with a 20% overall reduction in build directory size). In a Clang debug build on Linux, .o file sizes are about 48% smaller (and the overall build directory is 38% smaller). On Windows, both Chrome and Clang had 37% smaller .obj files.

The smaller object file size also results in an improvement in link times and GDB load times. On Windows, linking Chrome with constructor homing is 6% faster, while linking Clang is 34% faster. On Linux there was no noticeable difference in link time in Chrome, but linking Clang is 25% faster.

Measured on my machine, without using --gdb-index, the GDB load time for Clang is about 2m30s without constructor homing, and 1 minute with. If --gdb-index is enabled, the GDB startup time is about a second regardless, and the binary size is about 30% smaller with constructor homing.

Potential pitfalls

Ideally, constructor homing shouldn’t change the debug info that’s available when debugging, but there are some cases where it does. Even though this is undefined behavior in C++, it is possible to define a class with a non-trivial constructor and create an instance of it without calling the constructor (this is often done in C code, where there are no constructors):

Foo *p = malloc(sizeof(Foo));p->someField = 1;

The constructor for Foo is never called, so its debug info is never emitted.

After enabling constructor type homing in Chrome, we discovered that there are a few classes in libc++ that avoid calling the constructor, and for various reasons, that would be difficult to change. To ensure that they still have debug info, there’s a new attribute in Clang called [[standalone_debug]]. If a class has the attribute it will have the same debug info as if it were built with -fstandalone-debug. This can be used to get debug info for classes that otherwise would have had their type info omitted with constructor homing (or with any of the other debug info optimizations).

I’ve also looked into whether there are other common cases where constructor homing omits debug info. Manually comparing the debug info available in a Clang build showed some missing types. There were a few classes that were not used anywhere. There were also one or two pseudo-namespace classes that only had static methods and were therefore never constructed. Looking at diffs of the debug info is somewhat difficult given how many object files and types there are (most of the missing types I saw were because the type wasn’t constructed in the particular binary or set of object files I compared), so there may have been other cases I missed.

Summary

Constructor type homing is a new optimization that greatly reduces the size of debug info in object files. Currently it can be enabled with the cc1 flag -fuse-ctor-homing, and the plan is to enable it by default as part of -fno-standalone-debug in Clang. If you want to make your debug builds smaller, try adding -Xclang -fuse-ctor-homing to your build and let us know how much object file size it saves.

Women in Compilers and Tools Meetup Series

Wed, 31 Mar 2021 00:00:00 +0000

As today is the last day of Women’s History Month, it seems fitting to announce a new meetup series for Women in Compilers and Tools.

The LLVM Women in Compilers and Tools Meetup Series is a free virtual event held each month. It is a platform where all women (trans, non-binary, and cis) in various stages in their career, speak openly, discuss, and network with others. This series will feature talks, tutorials, mentoring events and regularly highlights individuals for their contributions to the compiler, programming languages, and tools field and offers continued discussions concluding each event.

This series is organized by The Women in Compilers and Tools (WiCT) Community.o group. This working group is composed of volunteers in the LLVM community and supported by the LLVM Foundation.

In this series launch, we will be hearing from The Women in Compilers and Tools (WiCT) working group. This working group is composed of volunteers in the LLVM community who’ve put together this series. The WiCT working group will provide a Birds of a Feather style talk where attendees can learn more about the series and what to look forward to in the coming months! This will be a great opportunity to ask questions and network with LLVM community members and enthusiasts.

The first meetup will occur on Thursday, April 22, 2021 at 6pm PDT and feature the following members of the WiCT working group:

Anupama Chandrasekhar, NVIDIA

Anupama is a Software Engineer at Nvidia working on graphics drivers, compilers, self driving cars and other cool technologies. Prior to this she was graphics software engineer at Intel working on the integrated GPU. Her prior work includes graphics/compute compiler development for GPUs, DX and Metal driver development and GPU performance. Her main interests are programming languages and compilers. She received her MS in Computer Science and Electrical Engineering from Pennsylvania State University and B.E in Electronics from Anna University, Chennai, India.

Cyndy Ishida, Apple

Cyndy Ishida has been a Compiler Engineer at Apple, Inc. since 2019 concentrating on library support with Clang tooling. Prior to working at Apple, she completed internships at Virtu Financial, Microsoft and Facebook related to C++ development. Cyndy is relatively new to the LLVM community and began her involvement by contributing Mach-O Support to TextAPI, which serves as a condensed textual representation of dynamic libraries from a linking perspective. She is additionally a Board Member for the LLVM Foundation and aided in launching Community.o, The Foundationâ€™s Diversity & Inclusion initiative and co-organized The Community.o Summit.

Tanya Lattner, LLVM Foundation

Tanya Lattner is the President and Chief Executive Officer of the LLVM Foundation, a nonprofit supporting the open source software project LLVM (llvm.org). As CEO, Tanya designs programs to support the LLVM project through educational events such as developers conferences and workshops, student support through travel grants, community outreach, and increasing diversity within the project through the Community.o Program.She has a Bachelors of Science in Electrical Engineering from the University of Portland and a Masters Degree in Computer Science from the University of Illinois Urbana-Champaign. Tanya has over 10 years experience as a software engineer primarily focusing on compilers and related tools. She also has 5 patents from her work on code obfuscation, which use compiler techniques to prevent tampering or reverse engineering by hackers.

Jubi Taneja, University of Utah

Jubi Taneja is a PhD candidate at the University of Utah. She will graduate and go on to work full-time with the Machine Learning Compiler group at Microsoft Research starting in the summer of 2021. Her research broadly focuses on compiler optimizations, correctness, and static analysis, with the goal of helping compiler developers use formal methods. She started her learning of compilers at IIT Bombay as an undergraduate research fellow. She earned her B.E. with a Gold Medal from Punjabi University, India. She is a SIGPLAN Long-Term Mentor for international PL researchers since summer 2020. She has been mentoring high school and undergraduate students from India for the past 10 years.

If you are interested in attending, please register here.

The New Pass Manager

Fri, 26 Mar 2021 00:00:00 +0000

LLVM’s New Pass Manager

What is a pass manager?

A pass manager schedules transformation passes and analyses to be run on IR in a specific order. Passes can run on an entire module, a single function, or something more abstract such as a strongly connected component (SCC) in a call graph or a loop inside of a function. Scheduling can be simple, such as running a list of module passes, or running function passes on every function inside a module. Scheduling can also be more involved, such as making sure we visit SCCs in the call graph in the correct order.

A pass manager is also responsible for managing analysis results. Analyses (e.g. dominator tree) should be shared between passes whenever possible for efficiency reasons, since recomputing analyses can be expensive. To do so, the pass manager must cache results and recompute them when they are invalidated by transforms.

For testing purposes, we can add specific passes to a pass manager to test those passes. However, the typical use case is to run a predetermined pass pipeline. For example, clang -O2 runs a predetermined set of passes on the input IR.

What is LLVM’s new pass manager?

LLVM currently has two separate pass managers: the legacy pass manager (legacy PM) and the new pass manager (new PM). When referring to “legacy PM” and “new PM”, this includes all of the surrounding infrastructure, not just the entity that manages passes.

The legacy PM has been in use for a very long time and did its job fairly well. However, there were some missing features required for better optimization opportunities, most notably the ability to use function analysis results for arbitrary functions from the inliner. The specific motivating use case was that the inliner wanted to look at the profile data of callees recursively, especially in regards to deferred inlining where the inliner wants to look through simple “wrapper” functions. The legacy PM did not support retrieval of analyses for arbitrary functions in a CGSCC pass. A CGSCC pass runs on a specific strongly connected component (SCC) of the call graph. The pass manager makes sure we visit SCCs bottom-up so that callees are as optimized as possible when we get to their callers and callers have as precise information as possible. LLVM’s inliner is a CGSCC pass due to being a bottom-up inliner. This major limitation of the legacy PM, along with other warts, prompted the desire for a new pass manager.

Currently the new PM applies only to the middle-end optimization pipeline working with LLVM IR. The backend codegen pipeline still works only with the legacy PM, mostly because most codegen passes don’t work on LLVM IR, but rather machine IR (MIR), and nobody has yet put in the time to create the new PM infrastructure for MIR passes and to migrate all of the backends to use the new PM. Migrating to the new PM for the codegen pipeline likely won’t unlock performance gains since there are almost no interprocedural codegen passes. However, it would clean up a lot of technical debt.

Design

With the legacy PM, each pass declares which analyses it requires and preserves, and the pass manager schedules those analyses as passes to be run if they aren’t currently cached or have been invalidated. Declaring ahead of time which analyses a pass may need is unnecessary boilerplate, and a pass might not end up using all analyses in all cases.

The new PM takes a different approach of completely separating analyses and normal passes. Rather than having the pass manager take care of analyses, a separate analysis manager is in charge of computing, caching, and invalidating analyses. Passes can simply request an analysis from the analysis manager, allowing for lazily computing analyses. In order for a pass to communicate that analyses have been invalidated, it returns which analyses it has preserved. The pass manager tells the analysis manager to handle invalidated cached analyses. This results in less boilerplate and better separation of concerns between passes and analyses.

Since the legacy PM modelled analyses as passes to be scheduled and run, we can’t efficiently access analyses to arbitrary functions. For a function analysis, the corresponding analysis pass will only contain the info for the current function, which is created during the latest run of the analysis pass. We can manually create analyses for other functions, but they won’t be cached anywhere, leading to lots of redundant work and unacceptable compile time regressions. Since analyses are handled by an analysis manager in the new PM, the analysis manager can cache arbitrary analyses for arbitrary functions.

To support CGSCC analyses, we need a key to cache analyses. For things like functions and loops, we have persistent data structures for those to use as keys. However, the legacy CGSCC pass manager only stored the functions in the current SCC in memory and did not have a persistent call graph data structure to use as keys to cache analyses. So we need to keep the whole graph in memory to have something to use as a key. And if we have a persistent call graph, we need to make sure it is up to date if passes change its structure. To avoid too much redundant work regenerating a potentially large but sparse graph, we need to incrementally update the graph. This is the reason behind the complexity of the CGSCC pass manager in the new PM.

Within an SCC, a transform might break a call graph cycle and split the SCC. One issue with the legacy CGSCC infrastructure is that it simply stores all the functions in the current SCC in an array, then iterates through the functions in that order without ever revisiting functions. Consider the following SCC containing two functions.

void foo() { bar();}void bar() { if (false) { foo(); }}

Say we first visit foo, then visit bar and remove the dead call.

void foo() { bar();}void bar() {}

We now want to revisit foo since we have better information, most notably that foo is in its own SCC. The legacy CGSCC pass manager would simply move on to the next part of the call graph. So as part of the new PM’s incremental call graph update, if an SCC is split, we make sure to visit the newly split SCCs bottom-up. This may involve revisiting a function we have already visited, but that is intentional as to give passes a chance to observe more precise information.

When adding passes to the legacy pass manager, the nesting of different pass types is implicit. For example, adding function passes after a module pass implicitly creates a function pass manager over a contiguous list of function passes. This is fine in theory, although it can be a little confusing. And some pipelines want to run a CGSCC pass independently of a function pass that comes right after, rather than nesting the function pass into the CGSCC pass via a CGSCC pass manager. The new PM makes the nesting more explicit by only allowing pass managers to contain passes of the equivalent type. For example, a function pass manager can only contain function passes. To add a loop pass to a function pass manager, the loop pass must be wrapped in a loop-to-function adaptor to turn it into a function pass. The IR nesting in the new PM is module (-> CGSCC) -> function -> loop, where the CGSCC nesting is optional. Requiring the CGSCC nesting was considered to simplify things, but the extra runtime overhead of building the call graph and the extra code for proper nesting to run function passes was enough to make the CGSCC nesting optional.

The legacy pass manager relies on many global flags and registries. This is supported by macros generating functions and variables to initialize passes, and any users of the legacy pass manager must make sure to call a function to initialize these passes. But we need some way for a pass manager builder to be aware of all passes for testing purposes. The way the new PM does this is by having the pass manager builder include the definitions of all passes, then use a large mapping of pass IDs to pass constructors to create a function that parses a textual description of a pipeline and adds passes. Users of a pass manager builder can add plugins that register parsing callbacks to handle custom out-of-tree passes. Although there is a global list of functions, there is no mutable global state since each pass manager builder can parse pass pipelines without going through a global registry. Other options, like debugging the execution of a pass manager, are also specified via the constructor, and not through a global flag.

There has been a desire to parallelize LLVM passes for a long time. Although the pass manager infrastructure is not the only blocker, the legacy PM did have a couple of issues blocking parallelization.At the call graph level, only sibling SCCs can be parallelized. Creating SCCs on demand makes it hard to find sibling SCCs. The new PM’s computation of the entire call graph makes it easy to find sibling SCCs to parallelize SCC passes on.Module analyses can be computed from function passes in the legacy PM. Some passes only use analyses if they are cached, so parallelization can cause non-determinism since a module analysis may or may not exist based on other parallel pipelines. The new PM only allows function passes to access cached module analyses and does not allow running them. This has the downside of needing to make sure that certain higher-level analyses are present before running a lower-level pipeline, e.g. making sure GlobalsAA has been computed before running a function pipeline.

Making the new pass manager the default pass manager

Some major users of LLVM switched to using the new PM by default many years ago. There were some efforts upstream to make the new PM work for all use cases. For example, all Clang tests had been passing with the new PM for a while. However, a vast majority of LLVM tests were still only testing the legacy PM. opt, the program typically used to test passes, had syntax to run passes using the legacy PM, opt -instcombine, and syntax to run passes using the new PM, opt -passes=instcombine. The vast majority of tests used the legacy PM syntax, so if the new PM were to be switched on by default, most LLVM tests wouldn’t be testing the new PM passes. (a good number of tests already manually ran against both)

To make tests using opt run against the new PM, we can either manually make them run twice, once against the legacy PM and once against the new PM, or we can automatically translate opt -instcombine to opt -passes=instcombine when the new PM is on by default. Rather than update every test, an -enable-new-pm option was added to opt, which translates the legacy syntax to the new syntax.

With this new option, we started discovering what features the legacy PM had that existing users of the new PM weren’t concerned with. Turning this on locally of course initially caused many tests to fail. Many passes hadn’t yet been ported to the new PM and some opt features didn’t work with the new PM. We ported passes and features that made sense to port to the new PM, and pinned tests using legacy PM features that didn’t make sense to port to the new PM.

Some of the more interesting issues with the new PM uncovered with -enable-new-pm:

The optnone function attribute didn’t cause optional passes to be skipped. Using the existing pass instrumentation framework, which calls callbacks before and after running a pass, and also allows passes to be skipped, this was a very simple pass instrumentation. However, some passes must be run to preserve correctness, so we ended up marking some passes as required.
Opt-bisect wasn’t supported in the new PM. It is used for bisecting which pass in a pipeline causes a miscompile by skipping passes after a certain point. This similarly was fairly easily implemented via pass instrumentation. This similarly always runs required passes.
Various target-specific tests were failing. Upon inspection, some passes that were expected to be run in something like the -O2 pipeline weren’t being run. Some backend targets add custom passes into the default pipelines. Some of these passes are required for correctness, such as passes to lower target-specific intrinsics. The legacy PM had a way for a TargetMachine to inject passes into default pipelines via TargetMachine::adjustPassManager(). A new PM equivalent was introduced and the target-specific passes in the optimization pipeline were ported to the new PM. This wasn’t previously an issue because existing users of the new PM were mostly concerned with x86, which didn’t use this feature in the legacy PM.
Some coroutine tests were asserting in the CGSCC infrastructure. It turns out that the new PM CGSCC infrastructure didn’t support extracting parts of a function into another (aka outlining) in a CGSCC pass. There were some initial failed attempts at hacks to work around this issue, which didn’t properly update the call graph and didn’t handle recursion from newly outlined functions. Finally we came up with a solution that fit into the existing CGSCC infrastructure and properly kept the call graph valid, although coroutine-specific call graph transformations had to be accomodated.

Improvements

Various projects/companies have already been using the new PM for performance reasons for many years. Separately, Chrome recently started using PGO and ThinLTO to make Chrome faster, each with noticeable performance wins. After the new PM was turned on by default in LLVM, Chrome followed suit and turned on the new PM, seeing 3-4% improvements in Speedometer 2.0 for Linux and Windows, on top of a 8-9MB size decrease. It’s likely that better usage of profile information as well as better handling of larger ThinLTO call graphs lead to these improvements.

However, smaller applications with tiny hotspots likely won’t see much benefit from the new PM since the improvements brought on by the new PM tend to be more relevant to large codebases.

Aside from user-facing improvements, this also helps LLVM’s code health by standardizing on one of the two pass managers for the optimization. While we can’t yet remove the legacy pass manager, we can start the deprecation of it, at least for the optimization pipeline. Then hopefully at some point we can start to remove parts of the optimization pipeline that are legacy PM-specific.

What’s next?

To begin the process of removing the use of the legacy PM in the optimization pipeline, we need to make sure that anything using the legacy PM has an alternative using the new PM. Just to list a couple: bugpoint, the LLVM C API, GPU divergence analysis.

As mentioned before, the codegen pipeline still only works with the legacy PM. Although there has been work to start making the codegen pipeline work with the new PM, it is still very far from being usable. This is a great entry point into LLVM, please ask on llvm-dev for more information if you’re interested.

Cling -- Beyond Just Interpreting C++

Thu, 25 Mar 2021 10:00:00 +0000

Interactive C++ with Cling

In our previous blog post “Interactive C++ for Data Science”we described eval-style programming, interactive C++ in Notebooks and CUDA. Thispost will discuss some developed applications of Cling supportinginteroperability and extensibility. We aim to demonstrate template instantiationon demand; embedding Cling as a service; and showcase an extension enablingon-the-fly automatic differentiation.

Template Instantiation on Demand

Cling implements a facility called LookupHelper, which takes C++ code andchecks if a declaration with that qualified name already exists. For instance:

[cling] #include "cling/Interpreter/Interpreter.h"[cling] #include "cling/Interpreter/LookupHelper.h"[cling] #include "clang/AST/Decl.h"[cling] struct S{};[cling] cling::LookupHelper& LH = gCling->getLookupHelper()(cling::LookupHelper &) @0x7fcba3c0bfc0[cling] auto D = LH.findScope("std::vector<S>", cling::LookupHelper::DiagSetting::NoDiagnostics)(const clang::Decl *) 0x1216bdcd8[cling] D->getDeclKindName()(const char *) "ClassTemplateSpecialization"

In this particular case, findScope instantiates the template and returns itsclang AST representation. Template instantiation on demand addresses the commonlibrary problem of template combinatorial explosion. Template instantiation ondemand and conversion of textual qualified C++ names into entity metainformation has proven to be a very powerful mechanism aiding data serializationand language interoperability.

Language Interop on Demand

An example is cppyy, which provides automaticPython bindings, at runtime, to C++ code through Cling. Python is itself adynamic language executed by an interpreter, thus making the interaction withC++ code more natural when intermediated by Cling. Examples include runtimetemplate instantiations, function (pointer) callbacks, cross-languageinheritance, automatic downcasting, and exception mapping. Many advanced C++features such as placement new, multiple virtual inheritance,variadic templates, etc., are naturally resolved by the LookupHelper.

cppyy achieves high performance through an all-lazy approach to runtime bindingsconstruction and specializations of common cases through runtime reflection.As such, it has a much lower call overhead than e.g. pybind11, and looping overa std::vector through cppyy is faster than looping over a numpy array of thesame type. Taking it a step further, its implementation for PyPy, a fullycompatible Python interpreter sporting at tracing JIT, canin many cases provide native access to C++ code in PyPy’s JIT, includingoverload resolution and JIT hints that allow for aggressive optimizations.

Thanks to Cling’s runtime reflection, cppyy makes maintaining a large softwarestack simpler: except for cppyy’s own python-interpreter binding, it does nothave any compiled code that is Python-dependent. I.e., cppyy-based extensionmodules require no recompilation when switching Python versions (or even whenswitching between the CPython and PyPy interpreters, say).

The example below shows the tight integration of C++ and python; shows the tightback and forth communication of the template instantiation and thecross-inheritance overrides; and the runtime behaviors (everything happens atruntime, there is no compiled code here).

import cppyycppyy.cppdef(r"""\template<typename T> class Producer {private: T m_value;protected: virtual T produce_imp() = 0;public: Producer(const T& value) : m_value(value) {} virtual ~Producer() {} T produce_total() { return m_value + produce_imp(); }};class Consumer {public: template<typename T> void consume(Producer<T>& p) { std::cout << "received: \"" << p.produce_total() << "\"\n"; }};""")def factory(base_v, *derived_v): class _F(cppyy.gbl.Producer[type(base_v)]): def __init__(self, base_v, *derived_v): super().__init__(base_v) self._values = derived_v def produce_imp(self): return type(base_v)(sum(self._values)) return _F(base_v, *derived_v)consumer = cppyy.gbl.Consumer()for producer in [factory(*x) for x in \ (("hello ", 42), (3., 0.14, 0.0015))]: consumer.consume(producer)

Output:

python3 cppyy_demo.pyreceived: "hello 42"received: "3.1415"

In the snippet we create python classes based on python arguments, which derivefrom a templated C++ class instantiated with a type. The python class providesthe implementation for a protected function that is called from a publicfunction, resulting in the expected return value, which is printed. We aim tohighlight:

Python creates classes at runtime, as can Cling, even when they aredeclared in a module (the relevant classes are here created in a factorymethod);
Templated C++ classes can be instantiated on the fly from Python, by takingthe type of the argument (i.e. using introspection at runtime in Python) tocreate the C++ base class for the Python class.
Cross-language derivation is at runtime, with no support needed from the C++class, other than a virtual destructor and virtual methods;
C++ “protected” methods can be overridden in Python, even though Python hasno such concept and you can not actually call protected methods from boundC++ objects in Python;
It all works straight out of the box.

cppyy is used in several large code bases in physics, chemistry, mathematics,and biology. It is readily installable through[pip from PyPI] (https://pypi.org/project/cppyy/) and throughconda.

Another example is Symmetry Integration Language (SIL), a D-baseddomain-specific language of functional flavor developed and used internally bySymmetry Investments. One of the main goalsof SIL is to be easily interoperable with all sorts of languages and systems,and this is achieved through various plugins. To call C++ code, SIL uses aplugin called sil-cling, which acts as a middle ground between SIL and Cling.However, sil-cling does not interact directly with Cling, but throughcppyy-backend, that is cppyy’s C/C++ wrapper around Cling that provides a stableC/C++ reflection API.

There are two core types that are exposed from sil-cling to SIL. One isCPPNamespace, which exposes a C++ namespace and allows free function calling,access to namespace’s variables, and object instantiation for the classesdefined in that namespace. The other is ClingObj, which is a proxy for a C++object, allowing construction, method calling and the manipulation of theobject’s data members. Given that cppyy represents C++ classes, structs andnamespaces as ‘scopes’ and reflection information about any of these C++entities is obtained through its associated ‘scope’ object, both wrapper typesexposed to SIL hold a reference to their associated scope object which isqueried whenever the wrapper types are used to call C++ code.

All the calls that are done from SIL through the two wrapper types have 3arguments: the wrapper object used, the name of the C++ function that needs tobe called, and (if needed) a sequence of arguments for that function. Once theoverload resolution and the argument conversion are done, sil-cling calls theappropriate cppyy function that will wrap the call and dispatch it to Cling forJIT compilation. At the moment, sil-cling can be used to call C++ libraries likeBoost.Asio, dlib or Xapian.

The example below creates a Boost Asio-based client-server application writtenin SIL using the sil-cling plugin. Server.sil contains the SIL code for theserver. It starts by including the relevant header files, using cppCompile.The next step is to create wrapper objects for the namespaces that are needed,and this is done by calling cppNamespace with the names of the namespaces thatone needs to access. These CPPNamespace wrappers are used to instantiate classesthat are defined inside the C++ namespaces that they wrap. Using these wrappers,an endpoint, an acceptor socket (which listens for incoming connections) and anactive socket (which handles communication with the client) are created. Thenthe server waits for a connection and, once a client connects, it reads itsmessage and sends a reply.

// Server.silimport * from silclingimport format from formatcppCompile ("#include <boost/asio.hpp>")cppCompile ("#include \"helper.hpp\"")// CPPNamespace wrappersasio = cppNamespace("boost::asio")tcp = cppNamespace("boost::asio::ip::tcp")helpers = cppNamespace("helpme")// Using namespace wrappers to instantiate classes - creates ClingObj(s)ioService = asio.obj("io_service")endpoint = tcp.obj("endpoint", tcp.v4(), 9999)// Acceptor socket - incoming connectionsacceptorSocket = tcp.obj("acceptor", ioService, endpoint)// Active socket - communication with clientactiveSocket = tcp.obj("socket", ioService)// Waiting for connection and use the activeSocket to connect with the clienthelpers.accept(acceptorSocket, activeSocket)// Waiting for messagemessage = helpers.getData(activeSocket);print(format("[Server]: Received \"%s\" from client.", message.getString()))// Send replyreply = "Hello \'" ~ message.getString() ~ "\'!"helpers.sendData(activeSocket, reply)print(format("[Server]: Sent \"%s\" to client.", reply))

Client.sil contains the SIL code for the client. As the server, it includes therelevant headers, creates wrappers for the required namespaces and uses them tocreate an endpoint and a socket. Then the client connects to the server, sends amessage, and waits for the server’s reply. Once the reply arrives, the clientprints it to the screen.

// Client.silimport * from silclingimport format from formatcppCompile ("#include <boost/asio.hpp>")cppCompile ("#include \"helper.hpp\"")asio = cppNamespace("boost::asio")tcp = cppNamespace("boost::asio::ip::tcp")helpers = cppNamespace("helpme")// Scope resolution operator <-> address::static_method() or address::static_memberaddress = classScope("boost::asio::ip::address")ioService = asio.obj("io_service")endpoint = tcp.obj("endpoint", address.from_string("127.0.0.1"), 9999)// Creating socketclient_socket = tcp.obj("socket", ioService)// Connectclient_socket.connect(endpoint)message = "demo"helpers.sendData(client_socket, message)print(format("[Client]: Sent \"%s\" to server.", message))message = helpers.getData(client_socket);print(format("[Client]: Received \"%s\" from server.", message.getString()))

Output:

[Client]: Sent "demo" to server.[Server]: Received "demo" from client.[Server]: Sent "Hello demo" to client.[Client]: Received "Hello demo" from server.

Interpreter/Compiler as a Service

The design of Cling, just like Clang, allows it to be used as a library. In thenext example we show how to incorporate libCling in a C++ program. Cling can beused on-demand, as a service, to compile, modify or describe C++ code. Theexample program shows several ways in which compiled and interpreted C++ caninteract:

callCompiledFn – The cling-demo.cpp defines an in global variable,aGlobal; a static float variable, anotherGlobal; and its accessors. Theinterp argument is an earlier created instance of the Cling interpreter.Just like in standard C++, it is sufficient to forward declare the compiledentities to the interpreter to be able to use them. Then, the executioninformation from the different calls to process is stored in a generic ClingValue object which is used to exchange information between compiled andinterpreted code.
callInterpretedFn – Complementing callCompiledFn, compiled code cancall an interpreted function by asking Cling to form a function pointer froma given mangled name. Then the call uses the standard C++ syntax.
modifyCompiledValue – Cling has full understanding of C++ and thus we cansupport complex low-level operations on stack-allocated memory. In theexample we ask the compiler for the memory address of the local variable locand ask the interpreter, at runtime, to square its value.

// cling-demo.cpp// g++ ... cling-demo.cpp; ./cling-demo#include <cling/Interpreter/Interpreter.h>#include <cling/Interpreter/Value.h>#include <cling/Utils/Casting.h>#include <iostream>#include <string>#include <sstream>/// Definitions of declarations injected also into cling./// NOTE: this could also stay in a header #included here and into cling, but/// for the sake of simplicity we just redeclare them here.int aGlobal = 42;static float anotherGlobal = 3.141;float getAnotherGlobal() { return anotherGlobal; }void setAnotherGlobal(float val) { anotherGlobal = val; }///\brief Call compiled functions from the interpreter.void callCompiledFn(cling::Interpreter& interp) { // We could use a header, too... interp.declare("int aGlobal;\n" "float getAnotherGlobal();\n" "void setAnotherGlobal(float val);\n"); cling::Value res; // Will hold the result of the expression evaluation. interp.process("aGlobal;", &res); std::cout << "aGlobal is " << res.getAs<long long>() << '\n'; interp.process("getAnotherGlobal();", &res); std::cout << "getAnotherGlobal() returned " << res.getAs<float>() << '\n'; setAnotherGlobal(1.); // We modify the compiled value, interp.process("getAnotherGlobal();", &res); // does the interpreter see it? std::cout << "getAnotherGlobal() returned " << res.getAs<float>() << '\n'; // We modify using the interpreter, now the binary sees the new value. interp.process("setAnotherGlobal(7.777); getAnotherGlobal();"); std::cout << "getAnotherGlobal() returned " << getAnotherGlobal() << '\n';}/// Call an interpreted function using its symbol address.void callInterpretedFn(cling::Interpreter& interp) { // Declare a function to the interpreter. Make it extern "C" to remove // mangling from the game. interp.declare("extern \"C\" int plutification(int siss, int sat) " "{ return siss * sat; }"); void* addr = interp.getAddressOfGlobal("plutification"); using func_t = int(int, int); func_t* pFunc = cling::utils::VoidToFunctionPtr<func_t*>(addr); std::cout << "7 * 8 = " << pFunc(7, 8) << '\n';}/// Pass a pointer into cling as a string.void modifyCompiledValue(cling::Interpreter& interp) { int loc = 17; // The value that will be modified // Update the value of loc by passing it to the interpreter. std::ostringstream sstr; // on Windows, to prefix the hexadecimal value of a pointer with '0x', // one need to write: std::hex << std::showbase << (size_t)pointer sstr << "int& ref = *(int*)" << std::hex << std::showbase << (size_t)&loc << ';'; sstr << "ref = ref * ref;"; interp.process(sstr.str()); std::cout << "The square of 17 is " << loc << '\n';}int main(int argc, const char* const* argv) { // Create the Interpreter. LLVMDIR is provided as -D during compilation. cling::Interpreter interp(argc, argv, LLVMDIR); callCompiledFn(interp); callInterpretedFn(interp); modifyCompiledValue(interp); return 0;}

Output:

./cling-demoaGlobal is 42getAnotherGlobal() returned 3.141getAnotherGlobal() returned 1getAnotherGlobal() returned 7.7777 * 8 = 56The square of 17 is 289

Crossing the compiled-interpreted boundary relies on the stability of Clang’simplementation of the host’s application binary interface (ABI). Over the yearsit has been very reliable for both Unix and Windows however, Cling is heavilyused to interact with GCC-compiled codebases and is sensitive to ABIincompatibilities between GCC and Clang with respect to the Itanium ABIspecification.

Extensions

Just like Clang, Cling can be extended by plugins. The next example demonstratesembedded use of Cling’s extension for automatic differentiation,Clad. Clad transforms the clang’s AST toproduce derivatives and gradients of mathematical functions. When creating theCling instance we specify -fplugin and the path to the plugin itself. Then wedefine a target function, pow2, and ask for its derivative with respect to itsfirst argument.

#include <cling/Interpreter/Interpreter.h>#include <cling/Interpreter/Value.h>// Derivatives as a service.void gimme_pow2dx(cling::Interpreter &interp) { // Definitions of declarations injected also into cling. interp.declare("double pow2(double x) { return x*x; }"); interp.declare("#include <clad/Differentiator/Differentiator.h>"); interp.declare("auto dfdx = clad::differentiate(pow2, 0);"); cling::Value res; // Will hold the evaluation result. interp.process("dfdx.getFunctionPtr();", &res); using func_t = double(double); func_t* pFunc = res.getAs<func_t*>(); printf("dfdx at 1 = %f\n", pFunc(1));}int main(int argc, const char* const* argv) { std::vector<const char*> argvExt(argv, argv+argc); argvExt.push_back("-fplugin=etc/cling/plugins/lib/clad.dylib"); // Create cling. LLVMDIR is provided as -D during compilation. cling::Interpreter interp(argvExt.size(), &argvExt[0], LLVMDIR); gimme_pow2dx(interp); return 0;}

Output:

./clad-demodfdx at 1 = 2.000000

Conclusion

We have demonstrated Cling’s capabilities for template instantiation on demand;incorporating an interpreter in third-party code; and facilitating interpreterextension. The lazy template instantiation in an embedded interpreter provides aservice which is very suitable for interoperability with C++. Extending such aservice with domain-specific capabilities such as automatic differentiation canbe an key enabler for various science cases and for other broader communities.

Acknowledgements

The author would like to thank Sylvain Corlay, Simeon Ehrig, David Lange,Chris Lattner, Javier Lopez Gomez, Wim Lavrijsen, Axel Naumann, Alexander Penev,Xavier Valls Pla, Richard Smith, Martin Vassilev, Ioana Ifrim who contributed tothis post.

You can find out more about our activities athttps://root.cern/cling/ andhttps://compiler-research.org.

LLVM meets Code Property Graphs

Tue, 23 Feb 2021 00:00:00 +0000

The code property graph (CPG) is a data structure designed to mine large codebases for instances of programming patterns via a domain-specific query language. It was first introduced in the proceedings of the IEEE Security and Privacy conference in 2014 (publication, PDF) in the context of vulnerability discovery in C system code and the Linux kernel in particular. The core ideas of the approach are the following:

the CPG combines several program representations into one
the CPG is stored in a graph database
the graph database comes with a DSL allowing to traverse and query the CPG

Currently, the CPG infrastructure is supported by several tools:

Ocular - a proprietary code analysis tool supporting Java, Scala, C#, Go, Python, and JavaScript languages
Joern - an open-source counterpart of Ocular supporting C and C++
Plume - an open-source tool supporting Java Bytecode

This article presents ShiftLeft’s open-source implementation of llvm2cpg - a standalone tool that brings LLVM Bitcode support to Joern.But before we dive into details, let us say few more words about CPG and Joern.

Code Property Graph

The core idea of the CPG is that different classic program representations are merged into a property graph, a single data structure that holds information about the program’s syntax, control- and intra-procedural data-flow.

Graphically speaking, the following piece of code:

void foo() { int x = source(); if (x < MAX) { int y = 2 * x; sink(y); }}

combines these three different representations:

into a single representation - Code Property Graph:

Joern

The property graph is stored in a graph database and made accessible via a domain-specific language (DSL) to identify programming patterns based on a DSL for graph traversals. The query language allows a seamless transition between the original code representations, making it possible to combine aspects of the code from different views these representations offer.

One of the primary interfaces to the code property graphs is a tool called Joern. It provides the mentioned DSL and allows to query the CPG to discover specific properties of a program.Here are some examples of the Joern’s DSL:

joern> cpg.typeDecl.name.pList[String] = List("ANY", "int", "void")joern> cpg.method.name.pList[String] = List( "foo", "<operator>.multiplication", "source", "<operator>.lessThan", "<operator>.assignment", "sink")joern> cpg.method("foo").ast.isControlStructure.code.pList[String] = List("if (x < MAX)")joern> cpg.method("foo").ast.isCall.map(c => c.file.name.head + ":" + c.lineNumber.get + " " + c.name + ": " + c.code).pList[String] = List( "main.c:2 <operator>.assignment: x = source()", "main.c:2 source: source()", "main.c:3 <operator>.lessThan: x < MAX", "main.c:4 <operator>.assignment: y = 2 * x", "main.c:4 <operator>.multiplication: 2 * x", "main.c:5 sink: sink(y)")

Besides the DSL, Joern comes with a data-flow tracker enabling more sophisticated queries, such as “is there a user controlled malloc in the program?”

The DSL is much more powerful than in the example, but that is out of scope of this article. Please, refer to the documentation to learn more.

LLVM and CPG

This part is split into two smaller parts: the first one covers a few implementation details, the second one shows an example of how to use llvm2cpg.If you are not interested in the implementation - scroll down :)

Implementation Details

When we decided to add LLVM support for CPG, one of the first questions was: how do we map bitcode representation onto CPG?

We took a simple approach - let’s pretend the SSA representation is just a flat source program. In other words, the following bitcode

define i32 @sum(i32 %a, i32 %a) { %r = add nsw i32 %a, %b ret i32 %r}

can be seen as a C program:

i32 sum(i32 a, i32 b) { i32 r = add(a, b); return r;}

From the high-level perspective, the approach is simple, but there are some tiny details we had to overcome.

Instruction semantics

We can map some of the LLVM instructions back onto the internal CPG operations. Here are some examples:

add, fadd -> <operator>.addition
bitcast -> <operator>.cast
fcmp eq, icmp eq -> <operator>.equals
urem, srem, frem -> <operator>.modulo
getelementptr -> a combination of <operator>.pointerShift, <operator>.indexAccess, and <operator>.memberAccess depending on the underlying types of the GEP operand

Most of these <operator>.*s have special semantics, which plays a crucial role in the Joern and Ocular built-in data-flow trackers.

Unfortunately, not every LLVM instruction has a corresponding operator in the CPG. In those cases, we had to fall back to function calls.For example:

select i1 %cond, i32 %v1, i32 %v3 turns into select(cond, v1, v2)
atomicrmw add i32* %ptr, i32 1 turns into atomicrmwAdd(ptr, 1) (same for any other atomicrmw operator)
fneg float %val turns into fneg(val)

The only instruction we could not map to the CPG is the phi: CPG doesn’t have a Phi node concept.We had to eliminate phi instructions using reg2mem machinery.

Redundancy

For a small C program

int sum(int a, int b) { return a + b;}

Clang emits a lot of redundant instructions by default

define i32 @sum(i32 %0, i32 %1) { %3 = alloca i32, align 4 %4 = alloca i32, align 4 store i32 %0, i32* %3, align 4 store i32 %1, i32* %4, align 4 %5 = load i32, i32* %3, align 4 %6 = load i32, i32* %4, align 4 %7 = add nsw i32 %5, %6 ret i32 %7}

instead of a more concise version

define i32 @sum(i32 %0, i32 %1) { %3 = add nsw i32 %1, %0 ret i32 %3}

In general, this is not a problem, but it adds more complexity for the data-flow tracker and needlessly increases the graph’s size.One of the considerations was to run optimizations before emitting CPG for the bitcode. Still, in the end, we decided to offload this work to an end-user: if you want fewer instructions, then apply the optimizations manually before emitting the CPG.

Type Equality

The other issue is related to the way LLVM handles types. If two modules in the same context use the same struct with the same name, LLVM renames the other struct to prevent name collisions. For example

; Module1%struct.Point = type { i32, i32 }

and

; Module 2%struct.Point = type { i32, i32 }

when loaded into the same context yield two types

%struct.Point = type { i32, i32 }%struct.Point.1 = type { i32, i32 }

We wanted to deduplicate these types for a better user experience and only emit Point in the final graph.

The obvious solution was to consider two structs with “similar” names and the same layout to be the same.However, we could not rely on the llvm::StructType::isLayoutIdentical because, despite the name, it produces misleading results.

According to llvm::StructType::isLayoutIdentical the structs Point and Pair have identical layout, but PointWrap and PairWrap are not.

; these two have identical layout%Point = type { i32, i32 }%Pair = type { i32, i32 }; these two DO NOT have identical layout%PointWrap = type { %Point }%PairWrap = type { %Pair }

This happens because llvm::StructType::isLayoutIdentical determines equality based on the pointers. That is, if all the struct elements are identical, then the layout identical.It also meant we could not use this approach to compare types from different LLVM contexts.We had to roll out our custom solution based on the Tree Automata to solve this issue.

There are few more details, but the article is getting longer than it needs to be.So let’s look at how to use llvm2cpg with Joern.

Example

Once you have Joern and llvm2cpg installed the usage is straightforward:

Convert a program into LLVM Bitcode
Emit CPG
Load the CPG into Joern and start the analysis

Here are the steps codified:

$ cat main.cextern int MAX;extern int source();extern void sink(int);void foo() { int x = source(); if (x < MAX) { int y = 2 * x; sink(y); }}$ clang -S -emit-llvm -g -O1 main.c -o main.ll$ llvm2cpg -output=/tmp/cpg.bin.zip main.ll

Now you get the CPG saved at /tmp/cpg.bin.zip which you can load into Joern and find if there is a flow from the source function to the sink:

$ joernjoern> importCpg("/tmp/cpg.bin.zip")joern> run.ossdataflowjoern> def source = cpg.call("source")joern> def sink = cpg.call("sink").argumentjoern> sink.reachableByFlows(source).pList[String] = List( """_____________________________________________________| tracked | lineNumber| method| file ||====================================================|| source | 5 | foo | main.c || <operator>.assignment | 5 | foo | main.c || <operator>.lessThan | 6 | foo | main.c || <operator>.shiftLeft | 7 | foo | main.c || <operator>.shiftLeft | 7 | foo | main.c || <operator>.assignment | 7 | foo | main.c || sink | 8 | foo | main.c |""")

Which indeed exists!

Conclusion

To conclude, let us outline some of the advantages and constraints implied by LLVM Bitcode:

the “surface” of the LLVM language is smaller than that of C and C++
many high-level details do not exist at the IR level
the program must be compiled, thus limiting the range of programs that one can analyze with Joern

Here you can find more tutorials and information.

If you get any questions, feel free to ping Fabs or Alex on Twitter, or better come over to the Joern chat.

Introducing Community.o and the Community.o Summit

Wed, 10 Feb 2021 00:00:00 +0000

The LLVM Foundation is excited to announce Community.o! This is a new face to the LLVM Foundationâ€™s Diversity and Inclusion and Women in Compilers and Tools program. Weâ€™ve adopted a name to represent what this program hopes to accomplish which is to build a strong, healthy, and diverse open source community. We believe that this can only be achieved by including people from all backgrounds, genders, and experiences and by ensuring that everyone feels welcome, included, and empowered to contribute.

Why Community.o? The name is inspired by the compilation model where object files link together for a final program, much like how folks from different backgrounds come together to make up llvmâ€™s community and shared goal of inclusivity.

While our main focus is on the LLVM community and the field of compilers and tools, we want to collaborate with other open source communities and provide resources and exchange ideas.

Are you interested in learning more, getting involved with Community.o or joining a specific group such as Women in Compilers and Tools? Please see the Community.o website.

On March 8-10, we will host our first Community.o Summit! This 3-day virtual event is an inclusive space for underrepresented groups and newcomers, anywhere in their career, interested in learning and contributing to compilers, tools, and programming languages. Weâ€™ll highlight members from different open source communities making incredible efforts to support a thriving environment. Summit attendees will enjoy interesting talks, panels, workshops, and have ample networking opportunities. Additionally, this summit will be engaging for anyone interested in increasing the diversity within the LLVM community or their related affiliations.

While the event is free, attendance is limited to provide the best experience for attendees and allow for better collaboration and networking. Please see the Community.o Summit website for more information and to apply to attend by February 26, 2021.

Bringing Stack Clash Protection to Clang / X86 â€” the Open Source Way

Sat, 30 Jan 2021 10:00:00 +0000

Context

Stack clash is an attack that dates back to 2017, when the Qualys Research Teamreleased an advisory with a joint blog post. It basicallyexploits large stack allocation (greater than PAGE_SIZE) that can lead tostack read/write not triggering the stack guard page allocated by the LinuxKernel.

Shortly after the advisory got released, GCC provided a countermeasure activated by -fstack-clash-protection thatbasically consists in splitting large allocation in chunks of PAGE_SIZE,with a probe in each chunk to trigger the kernel stack guard page.

This has been a major security difference between GCC and Clang since then. Ithas even been identified as a blocker by Fedora to move from GCC to Clang as thecompiler for some projects that already made the move upstream, leading to extramaintenance for packagers.

Support for this flag landed in Clang in 2020,only for X86, SystemZ and PowerPC. Its implementation is a result of a fruitfulcollaboration between LLVM, Firefox and Rust developers.

Rust already had a countermeasure implemented in the form of a runtime call toperform the stack probing. With LLVM catching up, using a more lightweightapproach got investigated in Rust.

Countermeasure Description

The Clang implementation for X86 is derived from the GCC implementation, with afew distinctions. The core ideas are:

thanks to X86 calling convention, we get a free probe at each call site,which means that each function starts with a probed stack
when probing the stack in the function prologue, we don’t probe the tail ofthe allocation. Stated otherwise, if the stack size is PAGE_SIZE + PAGE_SIZE/2,we want to probe only once. This is important to limit the numberof probes: if the stack size is lower than PAGE_SIZE no probe is needed
because a signal can interrupt the execution flow any time, at no pointshould we have two stack allocations (lower than PAGE_SIZE) without a probein between.

The probing strategy for stack allocation varies based on the size of the stackallocation. If it’s smaller than PAGE_SIZE, thanks to (2) no probing isneeding. If it’s below a small multiple of PAGE_SIZE, then the probing loopcan be unrolled. Otherwise a probing loop alternates stack allocation ofPAGE_SIZE bytes and probe, starting with the allocation thanks to (1).

As side effect of (2) is that when performing a dynamic allocation, we need toprobe before updating the stack, otherwise we got a hole in the protection.This probe cannot be done after the stack update, even with an offset, becauseof (3). Otherwise we end up with a bug as this one found in GCC

The following scheme attempts to summarize the allocation and probinginteraction between static and dynamic allocations:

 + ----- <- ------------ <- ------------- <- ------------ + | |[free probe] -> [page alloc] -> [alloc probe] -> [tail alloc] + -> [dyn probe] -> [page alloc] -> [dyn probe] -> [tail alloc] + | | + <- ----------- <- ------------ <- ----------- <- ------------ +

Validation with Firefox

Firefox provides an amazing test bench to evaluate the impact of compilerchanges. Indeed, with more than 12MLOC of C/C++ and 3MLOC of Rust built usingPGO/LTO and XLTO, most of the important cases are covered.

Moreover, Firefox being supported on a large set of operating system andarchitectures, it was a great way to test the Stack Clash protection on variousset of configurations.

The work is detailed in the bug 1588710.

Functional Testing

To make sure that Firefox would perform as expected, we leveraged the huge testsuite to verify that the product would still work as expected with this option.

We used the try auto, a new command which will run the mostappropriate set of tests for such kind of changes during the development phase.Then, once the patch landed into Mozilla-central (Firefox nightly), the wholetest suite is executed, presenting about 29 days of machine time for about 9000tasks.

Thanks to this infrastructure, we have identified an issue withalloca(0) generating buggy machine code.Fortunately, the fix was already in the trunk version of LLVM.We cherry-picked the fix in our custom Clang build which addressed our issue.

Performance Testing

Over the years, Mozilla has developed a few tools to evaluate performanceimpact of changes, from micro-benchmark to page loads. These tools have been keyto improve Firefox overall performances but also evaluate the impact of the moveto Clang on all platforms done a couple years ago.

The usual procedure to evaluate performances improvements/regressions is to:

Run two builds with benchmarks. One without the patch, one with it.
Leverage the tooling to rerun the benchmark (usually 5 to 20 times) to limitthe noise.
Compare the various benchmark to see if significant regressions can beidentified.

In the context of this project, we run the usual benchmarks sensitive to C++changes and we haven’t identified any regression in term ofperformances.

Current status

Firefox nightly on Linux is now compiled with the stack-clash-option fromJanuary 8th 2021. We have not detected any regressions since it landed.If everything goes well, this change should ship with Firefox 86 (planned formid February 2021).

Validation With Rust

Rust has long supported the callback style of the LLVM probe-stack attribute,using the function __rust_probestack defined in its own compiler builtinslibrary. In Rust’s spirit of safety, this attribute is added to all functions,letting LLVM sort out which actually need probing. However, forcing such a callinto every function with a large stack frame is not ideal for performance,especially for those cases that could use just a few unrolled probes inline.Furthermore, Rust only has this callback implemented for its Tier 1 (mostsupported) targets, namely i686 and x86_64, leaving other architectures withoutprotection so far. Therefore, letting LLVM generate inline stack probes isbeneficial both for the performance of avoiding a call and for the increasedarchitecture support.

Since the Rust compiler is written in Rust itself, with stack probing enabled bydefault, it makes a great functional test for any new code generation feature.The compiler is bootstrapped in stages, first building with a prior version,then rebuilding with the result of that first stage. Codegen issues are oftenrevealed if the compiler crashes during that rebuild, and experiments withinline stack probes were no different, leading to fixes inD82867 andD90216. Both of these were simple errors thatwere not apparent in existing FileCheck tests, showing the importance ofactually executing generated code.

An issue also led to the realization that there was a moregeneral bug impacting both GCC and LLVM implementation of-fstack-clash-protector, leading to a new patch set on the LLVM side.Essentially, the observed behavior is the following:

Alignment requirements behave similarly to allocation with respect to the stack:they (may) make it grow. For instance the stack allocation for an char foo[4096] __attribute__((aligned(2048))); is done through:

and rsp, -2048sub rsp, 6024

Both and and the sub actually update the stack! To take that effect intoaccount, the LLVM patch considers the and rsp, -2048 as a sub rsp, 2048when computing the probing distance, which means considering the worstcase scenario.

For future work on the Rust side, inline stack probes will replace__rust_probestack on i686 and x86_64 soon in Rustpr77885, and that will includeperf results to monitor the effect. After that,additional architectures can be functionally tested and enabled for inline stackprobes as well, increasing the reach of Rust’s memory safety.

Validation with a Binary Tracer

None of the above validation validates the security aspect of the protection. Tohave more confidence on the actual probing scheme implementation, we implementeda binary tracer based on the (awesome) QBDIDynamic Binary Instrumentation framework. This Proof Of Concept (POC) isavailable on GitHub:stack-clash-tracer

This tool instruments all stack allocation and memory access of a runningbinary, logs them and checks that no stack allocation is greater thanPAGE_SIZE and that we get an actual probing between two allocations.

Here is a sample session that showcases large stack allocation issues:

$ cat main.c#include <alloca.h>#include <string.h>int main(int argc, char**argv) { char buffer[5000]; strcpy(buffer, argv[0]); char* dynbuffer = alloca(argc * 1000); strcpy(dynbuffer, argv[0]); return buffer[argc] + dynbuffer[argc];}$ gcc main.c -o main$ LD_PRELOAD=./libstack_clash_tracer.so ./main 1[sct][error] stack allocation is too big (5024)$ LD_PRELOAD=./libstack_clash_tracer.so ./main 1 2 3 4 5[sct][error] stack allocation is too big (5024)[sct][error] stack allocation is too big (6016)

The same code, compiled with -fstack-clash-protection, is safer (apart fromthe stupid use of strcpy, that is)

$ gcc main.c -fstack-clash-protection -o main$ LD_PRELOAD=./libstack_clash_tracer.so ./main 1$ LD_PRELOAD=./libstack_clash_tracer.so ./main 1 2 3 4 5

Small bonus of this compiler-independent approach: we can verify both GCC andClang implementation :-)

$ clang main.c -fstack-clash-protection -o main$ LD_PRELOAD=./libstack_clash_tracer.so ./main 1$ LD_PRELOAD=./libstack_clash_tracer.so ./main 1 2 3 4 5

To come back on the Firefox test case, before we landed the change, we couldsee:

$ LD_PRELOAD=./libstack_clash_tracer.so firefox-bin[sct][error] stack allocation is too big (4168)

Once Firefox nightly shipped with stack clash protection, this warningdisappears.

Conclusion

Aside from the technical aspects of the countermeasure, it is interesting tonote that its Clang implementation was derived from the GCC implementation, butled to an issue being reported in the GCC codebase. The Clang-generated code gotvalidated by Firefox People, tested by Rust people who reported several bugs,some impacting both Clang and GCC implementation, the circle is complete!

References

Interactive C++ for Data Science

Sun, 20 Dec 2020 10:00:00 +0000

Interactive C++ for Data Science

In our previous blog post “Interactive C++ with Cling”we mentioned that exploratory programming is an effective way to reduce thecomplexity of the problem. This post will discuss some applications of Clingdeveloped to support data science researchers. In particular, interactivelyprobing data and interfaces makes complex libraries and complex data moreaccessible to users. We aim to demonstrate some of Clingâ€™s features at scale;Clingâ€™s eval-style programming support; projects related to Cling; and showinteractive C++/CUDA.

Eval-style programming

A Cling instance can access itself through its runtime. The example creates acling::Value to store the execution result of the incremented variable i.That mechanism can be used further to support dynamic scopes extending the namelookup at runtime.

[cling]$ #include <cling/Interpreter/Value.h>[cling]$ #include <cling/Interpreter/Interpreter.h>[cling]$ int i = 1;[cling]$ cling::Value V;[cling]$ gCling->evaluate("++i", V);[cling]$ i(int) 2[cling]$ V(cling::Value &) boxes [(int) 2]

V “boxes” the expression result providing extended lifetime if necessary.The cling::Value can be used to communicate expression values from theinterpreter to compiled code.

[cling]$ ++i(int) 3[cling]$ V(cling::Value &) boxes [(int) 2]

This mechanism introduces a delayed until runtime evaluation which enables somefeatures increasing the dynamic look and feel of the C++ language.

The ROOT data analysis package

The main tool for storage, research and visualization of scientific data in thefield of high energy physics (HEP) is the specialized software package ROOT.ROOT is a set of interconnected components that assist scientists from datastorage and research to their visualization when published in a scientificpaper. ROOT has played a significant role in scientific discoveries such asgravitational waves, the great cavity in the Pyramid of Cheops, the discovery ofthe Higgs boson by the Large Hadron Collider. For the last 5 years, Cling hashelped to analyze 1 EB physical data, serving as a basis for over 1000scientific publications, and supports software run across a distributed millionCPU core computing facility.

ROOT uses Cling as a reflection information service for data serialization. TheC++ objects are stored in a binary format, vertically. The content of a loadeddata file is made available to the users and C++ objects become a first classcitizen.

A central component of ROOT enabled by Cling is eval-style programming. We usethis in HEP to make it easy to inspect and use C++ objects stored by ROOT.Cling enables ROOT to inject available object names into the name lookup whena file is opened:

[root] ntuple->GetTitle()error: use of undeclared identifier 'ntuple'[root] TFile::Open("tutorials/hsimple.root"); ntuple->GetTitle() // #1(const char *) "Demo ntuple"[root] gFile->ls();TFile** tutorials/hsimple.root Demo ROOT file with histograms TFile* tutorials/hsimple.root Demo ROOT file with histograms OBJ: TH1F hpx This is the px distribution : 0 at: 0x7fadbb84e390 OBJ: TNtuple ntuple Demo ntuple : 0 at: 0x7fadbb93a890 KEY: TH1F hpx;1 This is the px distribution [...] KEY: TNtuple ntuple;1 Demo ntuple[root] hpx->Draw()

The ROOT framework injects additional names to the name lookup on two stages.First, it builds an invalid AST by marking the occurrence of ntuple (#1), thenit is transformed intogCling->EvaluateT</*return type*/void>("ntuple->GetTitle()", /*context*/);On the next stage, at runtime, ROOT opens the file, reads its preambule andinjects the names via the external name lookup facility in clang. Thetransformation becomes more complex if ntuple->GetTitle() takes arguments.

Figure 1. Interactive plot of the px distribution read from a root file.

C++ in Notebooks

Section Author: Sylvain Corlay, QuantStack

The Jupyter Notebooktechnology allows users to create and share documents that contain live code,equations, visualizations and narrative text. It enables data scientists toeasily exchange ideas or collaborate by sharing their analyses in astraight-forward and reproducible way. Language agnosticism is a key designprinciple for the Jupyter project, and the Jupyter frontend communicates withthe kernel (the part of the infrastructure that runs the code) through awell-specified protocol. Kernels have been developed for dozens of programminglanguages, such as R, Julia, Python, Fortran (through the LLVM-based LFortranproject).

Jupyter’s official C++ kernel relies on Xeus,a C++ implementation of the kernel protocol, and Cling. An advantage of using areference implementation for the kernel protocol is that a lot of features comefor free, such as rich mime type display, interactive widgets, auto-complete,and much more.

Rich mime-type rendering for user-defined types can be specified by providingan overload of mime_bundle_repr for the said type, which is picked up byargument dependent lookup.

Figure 2. Inline rendering of images in JupyterLab for a user-defined image type.

Possibilities with rich mime type rendering are endless, such as rich display ofdataframes with HTML tables, or even mime types that are rendered in thefront-end with JavaScript extensions.

An advanced example making use of rich rendering with Mathjax is the SymEnginesymbolic computing library.

Figure 3. Using rich mime type rendering in Jupyter with the Symengine package.

Xeus-cling comes along with an implementation of the Jupyter widgets protocolwhich enables bidirectional communication with the backend.

Figure 4. Interactive widgets in the JupyterLab with the C++ kernel.

More complex widget libraries have been enabled through this framework likexleaflet.

Figure 5. Interactive GIS in C++ in JupyterLab with xleaflet.

Other features include rich HTML help for the standard library and third-partypackages:

Figure 6. Accessing cppreference for std::vector from JupyterLab by typing `?std::vector`.

The Xeus and Xeus-cling kernels were recently incorporated as subprojects toJupyter, and are governed by its code of conduct and general governance.

Planned future developments for the xeus-cling kernel include: adding supportfor the Jupyter console interface, through an implementation of the Jupyteris_complete message, currently lacking; adding support for cling“dot commands” as Jupyter magics; and supporting the new debugger protocol thatwas recently added to the Jupyter kernel protocol, which will enable the use ofthe JupyterLab visual debugger with the C++ kernel.

Another tool that brings interactive plotting features to xeus-cling is xvega,which is at an early stage of development, produces vega charts that can bedisplayed in the notebook.

Figure 7. The xvega plotting library in the xeus-cling kernel.

CUDA C++

Section Author: Simeon Ehrig, HZDR

The Cling CUDA extension brings the workflows of interactive C++ to GPUs withoutlosing performance and compatibility to existing software. To execute CUDA C++Code, Cling activates an extension in the compiler frontend to understand theCUDA C++ dialect and creates a second compiler instance that compiles the codefor the GPU.

Figure 8. CUDA/C++ information flow in Cling.

Like the normal C++ mode, the CUDA C++ mode uses AST transformation to enableinteractive CUDA C++ or special features as the Cling print system. In contrastto the normal Cling compiler pipeline used for the host code, the devicecompiler pipeline does not use all the transformations of the host pipeline.Therefore, the device pipeline has some special transformation.

[cling] #include <iostream>[cling] #include <cublas_v2.h>[cling] #pragma cling(load "libcublas.so") // link a shared library// set parameters// allocate memory// ...[cling] __global__ void init(float *matrix, int size){[cling] ? int x = blockIdx.x * blockDim.x + threadIdx.x;[cling] ? if (x < size)[cling] ? matrix[x] = x;[cling] ? }[cling][cling] // launching a function direct in the global space[cling] init<<<blocks, threads>>>(d_A, dim*dim);[cling] init<<<blocks, threads>>>(d_B, dim*dim);[cling][cling] cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, dim, dim, dim, &alpha, d_A, dim, d_B, dim, &beta, d_C, dim);[cling] cublasGetVector(dim*dim, sizeof(h_C[0]), d_C, 1, h_C, 1);[cling] cudaGetLastError()(cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0

Like the normal C++ mode, the CUDA mode can be used in a Jupyter Notebook.

Figure 9. CUDA/C++ information flow in Cling.

A special property of Cling in CUDA mode is that the Cling application becomes anormal CUDA application at the time of the first CUDA API call. This enables theCUDA SDK with Cling. For example, you can use the CUDA profilernvprof ./cling -xcuda to profile your interactive application.This docker container can be used toexperiment with Cling’s CUDA mode.

Planned future developments for the CUDA mode include: Supporting of thecomplete current CUDA API; Redefining CUDA Kernels; Supporting other GPU SDK’slike HIP (AMD) and SYCL (Intel).

Conclusion

We see the use of Interactive C++ as an important tool to develop forresearchers in the data science community. Cling has enabled ROOT to be the“go to” data analysis tool in the field of High Energy Physics for everythingfrom efficient I/O to plotting and fitting. The interactive CUDA backend allowseasy integration of research workflows and simpler communication between C++ andCUDA. As Jupyter Notebooks have become a standard way for data analysts toexplore ideas, Xeus-cling ensures that great interactive C++ ingredients areavailable in every C++ notebook.

In the next blog post we will focus on Cling enabling features beyondinteractive C++, and in particular language interoperability.

Acknowledgements

You can find out more about our activities athttps://root.cern/cling/ andhttps://compiler-research.org.

Interactive C++ with Cling

Mon, 30 Nov 2020 10:00:00 +0000

Interactive C++ with Cling

The C++ programming language is used for many numerically intensive scientificapplications. A combination of performance and solid backward compatibility hasled to its use for many research software codes over the past 20 years. Despiteits power, C++ is often seen as difficult to learn and inconsistent with rapidapplication development. Exploration and prototyping is slowed down by the longedit-compile-run cycles during development.

Cling has emerged as a recognizedcapability that enables interactivity, dynamic interoperability and rapidprototyping capabilities to C++ developers. Cling supports the full C++ featureset including the use of templates, lambdas, and virtual inheritance. Cling isan interactive C++ interpreter, built on top of the Clang and LLVMcompiler infrastructure. The interpreter enables interactive exploration andmakes the C++ language more welcoming for research.

The main tool for storage, research and visualization of scientific data in thefield of high energy physics (HEP) is the specialized software packageROOT. ROOT is a set of interconnected components that assistscientists from data storage and research to their visualization when publishedin a scientific paper. ROOT has played a significant role in scientificdiscoveries such as gravitational waves, the great cavity in the Pyramid ofCheops, the discovery of the Higgs boson by the Large Hadron Collider. For thelast 5 years, Cling has helped to analyze 1 EB physical data, serving as a basisfor over 1000 scientific publications, and supports software run across adistributed million CPU core computing facility.

Recently we started a project aiming to leverage our experience in interactiveC++, just-in-time compilation technology (JIT), dynamic optimizations, and largescale software development to greatly reduce the impedance mismatch between C++and Python. We will generalize Cling to offer a robust, sustainable andomnidisciplinary solution for C++ language interoperability.The scope of ourobjectives is to:

advance the interpretative technology to provide a state-of-the-art C++execution environment,
enable functionality which can provide native-like, dynamic runtimeinteroperability between C++ and Python (and eventually other languages suchas Julia and Swift)
allow seamless utilization of heterogeneous hardware (such as hardwareaccelerators)

Project results will be integrated into the widely used tools LLVM, Clang andCling. The outcome of the proposed work is a platform which provides a C++compiler as a service (CaaS) for both rapid application development andcomputational performance.

The rest of this post intends to demonstrate the design and several features ofCling. Want to follow along? You can get cling from conda

conda config --add channels conda-forgeconda install clingconda install llvmdev=5.0.0

or from docker-hub if you don’t already use conda:

docker pull compilerresearch/clingdocker run -t -i compilerresearch/cling

Either way, type “cling” to start its interactive shell:

cling****************** CLING ******************* Type C++ code and press enter to run it ** Type .q to exit ********************************************[cling]$[cling]$ #include "cling/Interpreter/Interpreter.h"[cling]$ gCling->allowRedefinition(false)

We will explain the purpose for these commands, and other alternatives for usingcling in further parts of this post.

Interpreting C++

Exploratory programming (or Rapid Application Development) is an effective wayto gain understanding of the requirements for a project; to reduce thecomplexity of the problem; and to provide an early validation of the systemdesign and implementation. In particular, interactively probing data andinterfaces makes complex libraries and complex data more accessible to users.It is important in data science, computational science and debugging. Itsignificantly reduces the time consumed by edit-run cycles during development.In practice, only few programming languages offer both a compiler and aninterpreter translating them into machine code, although whether a language isto be interpreted or compiled is a property of the implementation.

Languages which enable exploratory programming tend to have interpreters whichshorten the compile-link cycle; this generally has a noticeable cost inperformance. Language developers who acknowledge the use case of exploratoryprogramming may also put syntactic sugar, but that is mostly for convenience andterseness. The performance penalty is largely mitigated by using just-in-time(JIT) or ahead-of-time (AOT) compilation technology.

For the sake of this post series, interpreting C++ means enabling exploratoryprogramming for C++ while mitigating the performance cost with JIT compilation.Figure 1 shows an illustrative example of exploratory programming. It becomestrivial to orient the shape, choose size and color or compare to previoussettings. The invisible compile-link cycle aids interactive use which allowssome qualitatively different approaches to program development and enhancedproductivity.

Figure 1. Interactive OpenGL Demo, adapted from [here](https://www.youtube.com/watch?v=eoIuqLNvzFs).

Design principles

Some of the design goals of cling include:

Do not pay for what you do not use – prioritize performance of processingcorrect code. For example, in order to provide error recovery do not penalizeusers typing syntactically and semantically correct C++; and interactive C++transformations are only done when necessary and can be disabled.
Reuse Clang & LLVM at (almost) any cost – do not reinvent the wheel. If afeature is not available, then try finding a minimalistic way to implement itand propose it for a review to the LLVM community. Otherwise find the minimalpatch, even at the cost of misusing API, which satisfies the requirements.
Continuous feature delivery – focus on a minimal feature, its integrationin the main use-case (ROOT), deployment in production, repeat.
Library design – allow Cling to be used as a library from third partyframeworks.
Learn and evolve – experiment with user experience. There is no formalspecification or consensus on overall user experience. Apply lessons learnedfrom the legacy from CINT.

Architecture

Cling accepts partial input and ensures that the compiler process keeps runningto act on code as it comes in. It includes an API providing access to theproperties of recently compiled chunks of code. Cling can apply customtransformations to each chunk before execution. Cling orchestrates the existingLLVM and Clang infrastructure following a data flow described in Figure 2.

Figure 2. Information flow in Cling

In short:

The tool controls the input infrastructure by interactive prompt or by aninterface allowing the incremental processing of input (âž€).
It sends the input to the underlying clang library for compilation (âž).
Clang compiles the input, possibly wrapped into a function, into an AST (âž‚).
When necessary the AST is further transformed in order to attach specificbehavior (âžƒ).

For example, reporting execution results, or other interpreter-related features.Once the high-level AST representation is ready, it is sent for lowering to anLLVM-specific assembly format, the LLVM IR (âž„). The LLVM IR is the input formatfor LLVMâ€™s just-in-time compilation infrastructure. Cling instructs the JIT torun specified functions (âž…), translating them into machine code (MC) targetingthe underlying device architecture (eg. Intel x86 or NVPTX) (âž†,âž‡).

The C++ standard is developed towards compilers and does not cover interactiveuse well. Execution of statements on the global scope, reporting executionresults, and entity redefinitions are the three most important features when itcomes to user friendliness. Long running interpreter sessions are prone totyping errors and make flawless error recovery essential. More advanceduse-cases require extra flexibility at runtime and lookup rules extensionsaiding eval-style programming. Efficient watermark-based code removal isimportant when C++ is used as scripting language.

Execution of statements

Cling processes C++ incrementally. Incremental input consists of one or multipleC++ statements. C++ does not allow expressions in the global scope.

[cling] #include <vector>[cling] #include <iostream>[cling] std::vector<int> v = {1,2,3,4,5}; v[0]++;[cling] std::cout << "v[0]=" << v[0] <<"\n";v[0]=2

Instead, Cling moves each input into a unique wrapper function. Eg:

void __unique_1 () { std::vector<int> v = {1,2,3,4,5};v[0]++;; } // #1void __unique_2 () { std::cout << "v[0]=" << v[0] <<"\n";; } // #2

After the clang AST is built, cling detects that wrapper #1 contains adeclaration and moves the declaration’s AST node to the global scope, such thatv can be referenced by subsequent inputs. Wrapper #2 contains a statement andis executed as is. Internally to Cling, the example is transformed to:

#include <vector>#include <iostream>std::vector<int> v = {1,2,3,4,5};void __unique_1 () { v[0]++;; }void __unique_2 () { std::cout << "v[0]=" << v[0] <<"\n";; }

Cling runs these wrappers after they are compiled to machine code.

Reporting execution results

An integral part of interactivity is printing expression values. Typing printfeach time is laborious and does not naturally include object type information.Instead, omitting the semicolon of the last statement of the input tells Clingto report the expression result. When wrapping the input, Cling textuallyattaches a semicolon to the end of it. If an execution report is requested thecorresponding wrapper AST does not contain a NullStmt (modelling extrasemicolons).

[cling] #include <vector>[cling] std::vector<int> v = {1,2,3,4,5} // Note the missing semicolon(std::vector<int> &) { 1, 2, 3, 4, 5 }

A transformation injects extra code depending on the properties of theparticular entity such as if it is copyable, if it is a wrapper temporary or anarray. Cling can report information about non-copyable or temporary objects byproviding a â€˜managedâ€™ storage. The managed storage (cling::Value) is also usedfor exchanging values between interpreted and compiled code in embedded setup.

Entity Redefinition

Name redefinition is an important scripting feature. It is also essential fornotebook-based C++ as each cell is a somewhat separate computation. C++ does notsupport redefinitions of entities.

[cling] #include <string>[cling] std::string v(std::string &) ""[cling] #include <vector>[cling] std::vector<int> vinput_line_7:2:19: error: redefinition of 'v' with a different type: 'std::vector<int>' vs 'std::string' (aka 'basic_string<char, char_traits<char>, allocator<char> >') std::vector<int> v ^input_line_4:2:14: note: previous definition is here std::string v ^

Cling implements entity redefinition using inline namespaces and rewires clanglookup rules to give higher priority to more recent declarations. The fulldescription of this feature was published as a conference paper on CC 2020(ACM conference on Compiler Construction).We enable it by calling gCling->allowRedefinition():

[cling] #include "cling/Interpreter/Interpreter.h"[cling] gCling->allowRedefinition()[cling] #include <vector>[cling] std::vector<int> v(std::vector<int> &) {}[cling] #include <string>[cling] std::string v(std::string &) ""

Invalid Code. Error Recovery

When used in interactive mode, invalid C++ does not terminate the session.Instead invalid code is discarded. The underlying clang process keeps theinvalid AST nodes in its internal data structures for better error diagnosticsand recovery, expecting the process will end shortly after issuing thediagnostics. This particular example is more challenging because it firstcontains both valid and invalid constructs. The error recovery should undo asignificant amount of changes in internal structures such as the name lookup andthe AST. Cling is used in many high-performance environments; usingcheckpointing is not a viable option as it introduces overhead for correct code.

[cling] #include <vector>[cling] std::vector<int> v; v[0].error_here;input_line_4:2:26: error: member reference base type 'std::__1::__vector_base<int, std::__1::allocator<int> >::value_type' (aka 'int') is not a structure or union std::vector<int> v; v[0].error_here; ~~~~^~~~~~~~~~~

In order to handle the example, Cling models the incremental input into aTransaction. A transaction represents the delta of the changes of internaldata structures of Clang. Cling listens to events coming from various Clangcallbacks such as declaration creation, deserialization and macro definition.This information is sufficient to undo the changes and continue with a validstate. The implementation is very intricate and in many cases requires extrawork depending on the input declaration kinds.

Cling also protects against null pointer dereferences via a code transformation,avoiding a session crash.

[cling] int *p = nullptr; *pinput_line_3:2:21: warning: null passed to a callee that requires a non-null argument [-Wnonnull] int *p = nullptr; *p ^[cling]

The implementation of error recovery and code unloading still has rough edgesand it is being improved constantly.

Code Removal

Incremental, interactive C++ assumes long lived sessions where not only syntaxerror can happen but also semantic ones. That poses one level of extracomplexity if we want to re-execute the same code with minor adjustments.

[cling] .L Adder.h // #1, similar to #include "Adder.h"[cling] Add(3, 1) // int Add(int a, int b) {return a - b; }(int) 2[cling] .U Adder.h // reverts the state prior to #1[cling] .L Adder.h[cling] Add(3, 1) // int Add(int a, int b) {return a + b; }(int) 4

In the example, we include a header file with the .L meta command;“uninclude” it with .U and â€œreincludeâ€ it with .L to re-read the modifiedfile. Unlike in the error recovery case, Cling cannot fence the machine codelowering infrastructure and needs to undo state changes in clang CodeGen and thellvm JIT and machine code infrastructure. The implementation of this featurerequires expertise in a big portion of the LLVM toolchain.

Conclusion

Cling has been one of the systems enabling interactive C++ for more than a decade.Clingâ€™s extensibility and fast prototyping features is of fundamental importancefor researchers in high-energy physics, and an enabler for many of thetechnologies that they rely on. Cling has several unique features tailored tothe challenges which come with incremental C++. Our work on interactive C++ isalways evolving. In the next blog post we will focus on interactive C++ for DataScience; Eval-Style Programming; Interactive CUDA; and C++ in notebooks.

You can find out more about our activities athttps://root.cern/cling/ andhttps://compiler-research.org.

Acknowledgements

The 2020 Virtual LLVM Developers' Meeting Program

Sun, 23 Aug 2020 17:00:06 -0700

The LLVM Foundation is excited to announce the 2020 Virtual LLVM Developers’ Meeting program! Registration will open this week.

Keynote:

Undef and Poison: Present and Future - J. Lee

Technical Talks:

Clang & Linux: Asm Goto with Outputs - B. Wendling; N. Desaulniers
LLVM Libc: Current Status, Challenges and Future Plans- S. Reddy; G. Chatelet; P. Asker; D. Finkelstein
Branch Coverage: Squeezing more out of LLVM Source-based Code Coverage - A. Phipps
Memory tagging in LLVM and Android - E. Stepanov; K. Serebryany; P. Collingbourne; M. Phillips
Towards a representation of arbitrary alias graph in LLVM IR for Fortran code - K. Li; T. Islam
Control-flow sensitive escape analysis in Falcon JIT - A. Pilipenko
Extending Clang for checking compliance with automotive coding standards - M. Vujosevic Janicic
An Update on Optimizing Multiple Exit Loops - P. Reames
Code Size Compiler Optimizations and Techniques - A. Kumar
Accelerate Matrix Multiplication Using the New POWER Outer Product Instructions- B. Saleil; J. Carvalho
CIL : Common MLIR Dialect for C/C++ and Fortran- P. NR; V. M; Ranjith; Srihari
Building compiler extension for LLVM 10.0.1 - S. Guelton
LLVM-based mutation testing for C and C++ - A. Denisov; S. Pankevich
Matrix Support in Clang and LLVM- F. Hahn
Adding CUDAÂ® Support to Cling: JIT Compile to GPUs - S. Ehrig
The Present and Future of Interprocedural Optimization in LLVM - J. Doerfert; B. Homerding; S. Baziotis; S. Stipanovic; H. Ueno; K. Dinel; S. Okumura; L. Chen
Pushing Back Lit’s Boundaries to Test Libc++ - L. Dionne
Evolving â€œconvergentâ€: Lessons from Control Flow in AMDGPU - N. HÃ¤hnle
How to update debug info in compiler transformations - A. Prantl; V. Kumar
Proposal for A Framework for More Effective Loop Optimizations - M. Kruse; H. Finkel
Changing Everything With Clang Plugins: A Story About Syntax Extensions, Clang’s AST, and Quantum Computing = H. Finkel; A. Mccaskey
(OpenMP) Parallelism-Aware Optimizations - J. Doerfert; S. Stipanovic; H. Mosquera; J. Chesterfield; G. Georgakoudis; J. Huber
Checked C: Adding memory safety support to LLVM- M. Grang; K. Kjeer

Tutorials:

Everything I know about debugging LLVM - N. Desaulniers
LLVM in a Bare Metal Environment - H. Qadeer
PGO: Add per-callsite counters - P. Kosov; Y. Sergey
Understanding Changes made by a Pass in the Opt Pipeline. - J. Schmeiser
Using clang-tidy for customized checkers and large scale source tree refactoring - V. Bridgers
Finding Your Way Around the LLVM Dependence Analysis Zoo - S. Baziotis; S. Moll
Using the clang static analyzer to find bugs - V. Bridgers
A Deep Dive into the Interprocedural Optimization Infrastructure - J. Doerfert; B. Homerding; S. Baziotis; S. Stipanovic; H. Ueno; K. Dinel; S. Okumura; L. Chen
MLIR Tutorial - M. Amini

Lightning Talks:

Finding and Outlining Similarities in LLVM IR - A. Litteken
A fast algorithm for global code motion of congruent instructions - A. Kumar; S. Pop
From Implicit Pass Dependencies to Effectiveness Prediction - H. Ueno; J. Doerfert; E. Park; G. Georgakoudis; T. Jayatilaka; S. Badruswamy
Using Clang as An Alternative C/C++ Frontend of The ROSE Source-to-Source Compiler - A. Wang; P. Lin; C. Liao; Y. Yan
OpenACC support in Flang with a MLIR dialect - V. Clement; J. Vetter
Fragmenting the DWARF to Enable Dead Debug Data Elimination - J. Henderson
Source-based Code Coverage for Embedded Use Cases - A. Phipps; C. Addison
pre-merge checks for LLVM - M. Goncharov; C. KÃ¼hnel
Getting stack size just right on XCore - J. McCrea
Compile Faster with the Program Repository and ccache - Y. Yi; P. Bowen-Huggett
GWP-TSan: Zero-Cost Detection of Data Races in Production - M. Morehouse; K. Serebryany
CompilerInvocation to -cc1 command line - D. Grumberg
Outer-Loop Vectorization Legality Analysis for RV: One Step Closer to a Powerful Vectorizer for LLVM - S. Baziotis
Flang Update - S. Scalpone
Code Feature Analysis, Tracking, and Future Usage - T. Jayatilaka; J. Doerfert; G. Georgakoudis; E. Park; H. Ueno; S. Badruswamy
Lowering XLA HLO using RISE - A Functional Pattern-based MLIR Dialect - M. LÃ¼cke; A. Smith; M. Steuwer
SYCL for CUDA: An overview of implementing PI for CUDA - A. Johnston
Extending LLDB to More Scripting Languages - J. Devlieghere
Adding a Subtarget Support to LLVM in Five Minutes - E. Yakubova

Birds of a Feather:

ClangBuiltLinux BoF - N. Desaulniers
Loop Optimization BoF - M. Kruse; K. Barton
LLVM Just-In-Time Compilers BoF - L. Hames
Code Size Optimization - S. Bartell; V. Adve

Student Research Competition

Enzyme: High-Performance Automatic Differentiation of LLVM - W. Moses; V. Churavy
SPAM: Stateless Permutation of Application Memory with LLVM - M. Ziad; M. Arroyo; S. Sethumadhavan
HPVM-FPGFA: Leveraging Compiler Optimizations for Hardware-Agnostic FPGA Programming - A. Ejjeh; K. Kanwar; M. Kotsifakou; V. Adve; R. Rutenbar
Guided Linking: shrinking and speeding up dynamically linked code - S. Bartell; V. Adve
ApproxTuner: A Compiler and Runtime System for Adaptive Approximations - H. Sharif; M. Kotsifakou; Y. Zhao; A. Kothari; B. Schreiber; E. Wang; Y. Sarita; N. Zhao; K. Joshi; V. Adve; S. Misailovic; S. Adve

Posters:

CIRCT: MLIR for Hardware Design - S. Neuendorffer; C. Lattner; A. Wilson
An Approach to Generate Correctly Rounded Math Libraries for New Floating Point Variants - J. Lim; M. Aanjaneya; J. Gustafson; S. Nagarakatte
Compiling a Higher-Order Smart Contract Language to LLVM - V. Nagaraj; J. Johannsen; A. Trunov; G. Pirlea; A. Kumar; I. Sergey
To -jInfinity & Beyond - W. Moses; K. Kwok; L. Sha
llvm-diva â€“ Debug Information Visual Analyzer - C. Enciso
Quickly Finding RISC-V Code Quality Issues with Differential Analysis - L. Marques
Error estimates of floating-point numbers and Jacobian matrix computation in Clad - V. Vassilev; A. Penev; R. Shakhov
Data Dependency using MSSA: Analysis and Contrast - R. Sharma
Connecting Clang to The ROSE Source-to-Source Compiler - A. Wang; P. Lin; C. Liao; Y. Yan
Incremental Compilation Support in Clang - V. Vassilevv; D. Lange

Announcing the new LLVM Foundation Board

Fri, 21 Aug 2020 17:54:06 -0700

The LLVM Foundation is pleased to announce its new Board of Directors, which includes:

Kit Barton
Kristof Beyls
Mike Edwards (Treasurer)
Hal Finkel
Cyndy Ishida
Anton Korobeynikov
Tanya Lattner (President)
Chris Lattner
Tom Stellard (Secretary)

Three new members and six continuing members were elected to the nine person board. Thank you to retiring board members Chandler Carruth, Arnaud de Grandmaison, and John Regehr for all of their contributions to the board.

We were pleased to have many qualified applicants to the Board of Directors this year, which enabled us to make the board stronger than ever. Unfortunately, this also meant that we could not include everyone that we wanted to. Our goal was to create a balanced board of individuals from a wide range of backgrounds and locations, and to provide a voice to many groups within the LLVM community. This is always a challenge with such a large and vibrant community, and such a small board.

About the board of directors (listed alphabetically by last name):

Kit Barton:

Kit Barton has been contributing to LLVM since 2015. His contributions have primarily been to the PowerPC backend and loop optimizations including the loop fusion pass. He has presented multiple technical talks, and tutorials at theLLVM Dev conferences over the last two years.

In addition to the contributions to LLVM, over the last two years Kit has driven the effort within IBM to migrate their proprietary C/C++ and Fortran compilers to leverage LLVM technology. He has also been involved in organizing the LLVM Meetups in Toronto as well as the Loop Optimization Working Group.

Kit is currently the technical lead for C/C++ and Fortran compilers on POWER and z/OS at IBM.

Kristof Beyls:

Kristof Beyls has worked on LLVM since about 2010, initially as part of tech leading the migration of Arm’s C/C++ toolchain to be based on LLVM technology. Since then, Kristof has worked on a large number of code generation projects using LLVM. He has contributed to LLVM in the areas of security mitigations, performance tuning, Arm backends, test-suite, LNT, etc.

He has been helping with the organization of EuroLLVM meetings since the start; has been organizing the FOSDEM LLVM dev rooms for the past couple of years and has organized a few socials in Belgium. He has also been on the program committee for a few of the dev meetings.

Kristof is Senior Principal Engineer at Arm.

Mike Edwards:

Mike Edwards has been involved with the LLVM Project since 2016 and has been most active behind the scenes working on infrastructure related issues. Mike joined the LLVM Foundation Board in 2018 and was elected Treasurer. Mike has used the past two years to help further the Foundations programs and support the many efforts of the Foundation to reach new users of LLVM technologies. Mike is looking forward to working with the new Board Members elected this year to help further the program development and outreach of the Foundation.

Mike is currently working as a Software Engineer at Apple, Inc. working on the Continuous Integration and Quality Engineering efforts for LLVM and Clang development.

Hal Finkel:

Hal Finkel has been an active contributor to the LLVM project since 2011. He is the code owner for the PowerPC target, the alias-analysis infrastructure, and other components.

In addition to his numerous technical contributions, Hal has chaired the LLVM in HPC workshop, which is held in conjunction with Super Computing (SC), starting in 2014. This workshop provides a venue for the presentation of peer-reviewed HPC-related researching LLVM from both industry and academia. He has also been involved in organizing an LLVM-themed BoF session at SC and LLVM socials, in addition to organizing community technical calls for Flang and aliasing analysis.

Hal is Lead for Compiler Technology and Programming Languages at Argonne National Laboratoryâ€™s Leadership Computing Facility. His team at Argonne works on several LLVM-based projects. Hal also teaches a compilers course at the University of Chicago’s Masters Program in Computer Science.

Cyndy Ishida:

Cyndy Ishida is relatively new to the LLVM community. She began her involvement by contributing Mach-O Support to TextAPI in the past year, which serves as a condensed textual representation of dynamic libraries from a static linking perspective.

In addition to open source contributions, Cyndy has participated in the program committee for the 2020 US LLVM Developersâ€™ Meeting and in the 2019 US Women in Compilers and Tools Workshop. With a long standing admiration for the LLVM Project and Foundation, Cyndy is passionate about supporting and expanding the community. She is excited to use her position to aid in educational outreach efforts as a driver to grow the set of diverse developers that make up the open source community, and is focused on advocating for inclusivity in the LLVM project.

Cyndy is a Compiler Engineer at Apple, Inc. concentrating on enhancing library support with Clang tooling.

Anton Korobeynikov:

Anton Korobeynikov has been an active contributor to the LLVM project since 2006. Over the years, he has numerous technical contributions to areas including Windows support, ELF features, debug info, exception handling, and backends such as ARM and x86. He was the original author of the MSP430 and original System Z backend.

In addition to his technical contributions, Anton has maintained LLVMâ€™s participation in Google Summer of Code by managing applications, deadlines, and overall organization. He also supports the LLVM infrastructure and has been on numerous program committees for the LLVM Developersâ€™ Meetings (both US and EuroLLVM).

Anton is currently an associate professor at the Saint Petersburg State University and has served on the LLVM Foundation board of directors for the last 6 years.

Tanya Lattner:

Tanya Lattner has been involved in the LLVM project for over 16 years. She began as a graduate student who wrote her master’s thesis using LLVM, and continued on using and extending LLVM technologies at various jobs during her career as a compiler engineer.

Tanya has been organizing the US LLVM Developersâ€™ meeting since 2008 and attended every developer meeting. She was the LLVM release manager for 3 years, moderates the LLVM mailing lists, and helps administer the LLVM infrastructure servers, mailing lists, bugzilla, etc. Tanya has also been on the program committee for the US LLVM Developersâ€™ meeting (4 years) and the EuroLLVM Developersâ€™ Meeting (1 year).

Tanya is the Chief Operating Officer and has served as the President of the LLVM Foundation board for the last 6 years.

Chris Lattner:

Chris Lattner is the founder of the LLVM project and has a lengthy history of technical contributions to the project over the years. He drove much of the early implementation, architecture, and design of LLVM, Clang, and MLIR. Chris actively participates in the LLVM Developersâ€™ meeting, served on the LLVM Board of Directors for many years, and helps drive important discussions and policy decisions related to the LLVM project.

Outside of LLVM, Chris has served in a wide range of technical leadership positions at Apple, Tesla, Google, and SiFive. These have spanned domains including compiler infrastructure, developer tools in general, machine learning infrastructure, autonomous vehicles, and microprocessor design. More details are available in his resumÃ© page.

Tom Stellard:

Tom Stellard has been contributing to the LLVM project since 2012. He was the original author of the AMDGPU backend and was also an active contributor to libclc. He has been the LLVM projectâ€™s stable release manager since 2014.

Tom is currently a Software Engineer at Red Hat and is the technical lead for emerging toolchains including Clang/LLvm. He also maintains the LLVM packages for the Fedoraproject.

The New Clang _ExtInt Feature Provides Exact Bitwidth Integer Types

Tue, 21 Apr 2020 17:13:00 +0000

Author: [email protected]" rel="nofollow">Erich Keane, Compiler Frontend Engineer, Intel Corporation

Earlier this month I finally committed a patch to implement the extended-integer type class, _ExtInt after nearly two and a half years of design and implementation. These types allow developers to use custom width integers, such as a 13-bit signed integer. This patch is currently designed to track N2472, a proposal being actively considered by the ISO WG14 C Language Committee. We feel that these types are going to be extremely useful to many downstream users of Clang, and provides a language interface for LLVM's extremely powerful integer type class.

Motivation

LLVM-IR has the ability to represent integers with a bitwidth from 1 all the way to 16,777,215((1<<24)-1), however the C language is limited to just a few power-of-two sizes. Historically, these types have been sufficient for nearly all programming architectures, since power-of-two representation of integers is convenient and practical.

Recently, Field-Programmable Gate Array (FPGA) tooling, called High Level Synthesis Compilers (HLS), has become practical and powerful enough to use a general purpose programming language for their generation. These tools take C or C++ code and produce a transistor layout to be used by the FPGA. However, once programmers gained experience in these tools, it was discovered that the standard C integer types are incredibly wasteful for two main reasons.

First, a vast majority of the time programmers are not using the full width of their integer types. It is rare for someone to use all 16, 32, or 64 bits of their integer representation. On traditional CPUs this isn't much of a problem as the hardware is already in place, so having bits never set comes at zero cost. On the other hand, on FPGAs logic gates are an incredibly valuable resource, and HLS compilers should not be required to waste bits on large power of two integers when they only need a small subset of that! While the optimizer passes are capable of removing some of these widths, a vast majority of this hardware needs to be emitted.

Second, the C language requires that integers smaller than int are promoted to operations on the 'int' type. This further complicates hardware generation, as promotions to int are expensive and tend to stick with the operation for an entire statement at a time. These promotions typically have semantic meaning, so simply omitting them isn't possible without changing the meaning of the source code. Even worse, the proliferation of auto has resulted in user code results in the larger integer size being quite viral throughout a program.

The result is massively larger FPGA/HLS programs than the programmer needed, and likely much larger than they intended. Worse, there was no way for the programmer express their intent in the cases where they do not need the full width of a standard integer type.

Using the _ExtInt Language Feature

The patch as accepted and committed into LLVM solves most of the above problems by providing the _ExtInt class of types. These types translate directly into the corresponding LLVM-IR integer types. The _ExtInt keyword is a type-specifier (like int) that accepts a required integral constant expression parameter representing the number of bits to be used. More succinctly: _ExtInt(7) is a signed integer type using 7 bits. Because it is a type-specifier, it can also be combined with signed and unsigned to change the signedness (and overflow behavior!) of the values. So "unsigned _ExtInt(9) foo;" declares a variable foo that is an unsigned integer type taking up 9 bits and represented as an i9 in LLVM-IR.

The _ExtInt types as implemented do not participate in any implicit conversions or integer promotions, so all math done on them happens at the appropriate bit-width. The WG14 paper proposes integer promotion to the largest of the types (that is, adding an _ExtInt(5) and an _ExtInt(6) would result in an _ExtInt(6)), however the implementation does not permit that and _ExtInt(5) + _ExtInt(6) would result in a compiler error. This was done so that in the event that WG14 changes the design of the paper, we will be able to implement it without breaking existing programs. In the meantime, this can be worked around with explicit casts: (_ExtInt(6))AnExtInt5 + AnExtInt6 or static_cast<ExtInt(6)>(AnExtInt5) + AnExtInt6.

Additionally, for C++, clang supports making the bitwidth parameter a dependent expression, so that the following is legal:
template<size_t WidthA, size_t WidthB>
_ExtInt(WidthA + WidthB) lossless_mul(_ExtInt(WidthA) a, _ExtInt(WidthB) b) {
return static_cast<_ExtInt(WidthA + WidthB)>(a)
* static_cast<_ExtInt(WidthA + WidthB)>(b);
}

We anticipate that this ability and these types will result in some extremely useful pieces of code, including novel uses of 256 bit, 512 bit, or larger integers, plus uses of 8 and 16 bit integers for those who can't afford promotions. For example, one can now trivially implement an extended integer type struct that does all operations provably losslessly, that is, adding two 6 bit values would result in a 7 bit value.

In order to be consistent with the C Language, expressions that include a standard type will still follow integral promotion and conversion rules. All types smaller than int will be promoted, and the operation will then happen at the largest type. This can be surprising in the case where you add a short and an _ExtInt(15), where the result will be int. However, this ends up being the most consistent with the C language specification.

Additionally, when it comes to conversions, these types 'lose' to the C standard types of the same size or greater. So, an int added to a _ExtInt(32) will result in an int. However, an int and a _ExtInt(33)will be the latter. This is necessary to preserve C integer semantics.

History

As mentioned earlier, this feature has been a long time coming! In fact, this is likely the fourth implementation that was done along the way in order to get to this point. Additionally, this is far from over, we very much hope that upon acceptance of this by the WG14 Standards Committee that additional extensions and features will become available.

I was approached to implement this feature in the Fall of 2017 by my company's FPGA group, which had the problems mentioned above. They had attempted a solution that used some clever parsing to make these look like templates, and implemented them extensively throughout the compiler. As I was concerned about the flexibility and usability of these types in the type and template system, we opted to implement these as a type-attribute under the controversially named Arbitrary Precision Int (spelled __ap_int). This spelling was heavily influenced by the vector-types implementations in GCC and Clang.

We then were able to wrap a set of typedefs (or dependent __ap_int types) in a structure that provided exactly the C and C++ interface we wished to expose. As this was a then proprietary implementation, it was kept in our downstream implementation, where it received extensive testing and usage.

Roughly a year later (and a little more than year ago from today!) I was authorized to contribute our implementation to the open source LLVM community! I decided to significantly refactor the implementation in order to better fit into the Clang type system, and uploaded it for review.This (now third!) implementation of this feature was proposed via RFC and code review at the same time.

While the usefulness was immediately acknowledged, it was rejected by the Clang code owner for two reasons: First the spelling was considered unpalatable, and Second it was a pure extension without standardization. This began the nearly year-long effort to come up with a standards proposal that would better define and describe the feature as well as come up with a spelling that was more in line with the standard language.

Thanks to the invaluable feedback and input from Richard Smith, my coworkers Melanie Blower, Tommy Hoffner, and myself were able to propose the spelling _ExtInt for standardization. Additionally, the feature again re-implemented at the beginning of this year and eventually accepted and committed!

The standardization paper (N2472) was presented at this Spring's WG14 ISO C Language Committee Meeting where it received near unanimous support. We expect to have an updated version of the paper with wording ready for the next WG14 meeting, where we hope it will receive sufficient support to be accepted into the language.

Future Extensions

While the feautre as committed in Clang is incredibly useful, it can be taken further. There are a handful of future extensions that we wish to implement once guidance from WG14 has been given on their direction and implementation.

First, we believe the special integer promotion/conversion rules, which omit automatic promotion to int and instead provide operations at the largest type are both incredibly useful and powerful. While we have received positive encouragement from WG14, we hope that the wording paper we provide will both clarify the mechanism and definition in a way that supports all common uses.

Secondly, we would like to choose a printf/scanf specifier that permits specifying the type for the C language. This was the topic of the WG14 discussion, and also received strong encouragement. We intend to come up with a good representation, then implement this in major implementations.

Finally, numerous people have suggested implementing a way of spelling literals of this type. This is important for two reasons: First, it allows using literals without casts in expressions in a way that doesn't run afoul of promotion rules. Second, it provides a way of spelling integer literals larger than UINTMAX_MAX, which can be useful for initializing the larger versions of these types. While the spelling is undecided, we intend something like: 1234X would result in an integer literal with the value 1234 represented in an _ExtInt(11), which is the smallest type capable of storing this value.

However, without the integer promotion/conversion rules above, this feature isn't nearly as useful. Additionally, we'd like to be consistent with whatever the C language committee chooses. As soon as we receive positive guidance on the spelling and syntax of this type, we look forward to providing an implementation.

Conclusion

In closing, we encourage you to try using this and provide feedback to both myself, my proposal co-authors, and the C committee itself! We feel this is a really useful feature and would love to get as much user experience as possible. Feel free to contact myself and my co-authors with any questions or concerns!

-[email protected]" rel="nofollow">Erich Keane, Intel Corporation

Deterministic builds with clang and lld

Thu, 07 Nov 2019 12:34:00 +0000

Deterministic builds can lower continuous integration costs and give you more confidence in your build and test process. This post outlines what it means for a build to be deterministic, the advantages of deterministic builds, and how to achieve them using LLVM tools.

What is a deterministic build, and its advantages

A build is called deterministic or reproducible if running it twice produces exactly the same build outputs.

There are several degrees of build determinism that are increasingly useful but increasingly difficult to achieve:

Basic determinism: Doing a full build of the same source code in the same directory on the same machine produces exactly the same output every time, in the sense that a content hash of the final build artifacts and of all intermediate files does not change.

Once you have this, if all your builders are configured the same way (OS version, toolchain, build path, checkout path, â€¦), they can share build artifacts, for example by using distcc.
This also allows local caching of test suite results keyed by a hash of test binary and test input files.
Illustrative example: ./build src out ; mv out out.old ; ./build src out ; diff -r out out.old

Incremental basic determinism: Like basic determinism, but the output binaries also donâ€™t change in partial rebuilds. In build systems that track file modification times to decide when to rebuild, this means for example that updating the modification time on a C++ source file (without doing any actual changes) and rebuilding will produce the same output as a full build.

This allows having build bots that donâ€™t do full builds each time, while still allowing caching of compile artifacts and test results.
Illustrative example: ./build src out ; cp -r out out.old ; touch src/foo.c ; ./build src out ; diff -r out out.old

Local determinism: Like incremental basic determinism, but builds are also independent of the name of the build directory. Builds of the same source code on the same machine produce exactly the same output every time, independent of the location of the source checkout directory or the build directory.

This allows machines to have several build directories at different locations but still share compile and test caches.
Illustrative example: cp -r src src2 ; ./build src out ; ./build src2 out2 ; diff -r out out2

Universal determinism: Like 3, but builds are also independent of the machine the build runs on. Everybody that checks out the project at a given revision into any directory and builds it following the build instructions ends up with exactly the same bits in the build output.

Since exact local OS and locally installed packages no longer matter, this allows devs to share compile and test caches with bots, without having to use difficult-to-setup containers.
It also allows easy verification of builds done by others to make sure output binaries havenâ€™t been tampered with.
Illustrative example: ./build src out ; ssh remote ./build src out && scp remote:out out2 ; diff -r out out2

Plan of attack

To make sure that a deterministic build stays deterministic, you should set up a builder that verifies that your build is deterministic. Even if your build isnâ€™t deterministic yet, you can set up a bot that verifies that some parts of your build are deterministic and then expand the checks over time.

For example, you could have a bot that does a full build in a fixed build directory, then moves the build artifacts out of the way, and does another full build, and once your compiles have basic determinism, add a step that checks that object files between the two builds directories are the same. You could even add incremental checking for specific subdirectories or build targets while you work towards full basic determinism.

Once your links are deterministic, check that binaries are identical as well. Once all your build steps are deterministic, compare all files in the two build directories.

Once your build has incremental determinism, do an incremental build for the first build and a full build for the second build. Once your build has local determinism, do the two builds at different build paths.

Getting to basic determinism

Basic determinism needs tools (compiler, linker, etc) that are deterministic. Tools internally must not output things in hash table order, multi-threaded programs must not write output in the order threads finish, etc. All of LLVMâ€™s tools have deterministic outputs when run with the right flags but not necessarily by default.

The C standard defines the predefined macros __TIME__ and __DATE__ that expand to the time a source file is compiled. Several compilers, including clang, also define the non-standard __TIMESTAMP__. This is inherently nondeterministic. You should not use these macros, and you can use -Wdate-time to make the compiler emit a warning when they are used.

If they are used in third-party code you donâ€™t control, you can use -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= to make them expand to nothing.

When targeting Windows, clang and clang-cl by default also embed the current time in a timestamp field in the output .obj file, because Microsoftâ€™s link.exe in /incremental mode silently mislinks files if that field isnâ€™t set correctly. If you donâ€™t use link.exeâ€™s /incremental flag, or if you link with lld-link, you should pass /Brepro to clang-cl to make it not write the current timestamp into its output.

Both link.exe and lld-link also write the current timestamp into output .dll or .exe files. To make them instead write a hash of the binary into this field, you can pass /Brepro to the linker as well. However, some tools, such as Windows 7â€™s app compatibility database, try to interpret that field as an actual timestamp and can get confused if itâ€™s set to a hash of the binary. For this case, lld-link also offers a /timestamp: flag that you can give an explicit timestamp thatâ€™s written into the output. You could use this to for example write the time of the commit the code is built at instead of the current time to make it deterministic. (But see the footnote on embedding commit hashes below.)

Visual Studioâ€™s assemblers ml.exe and ml64.exe also insist on writing the current time into their output. In situations like this, where you canâ€™t easily fix the tool to write the right output in the first place, you need to write wrappers that fix up the file after the fact. As an example, ml.py is the wrapper the Chromium project uses to make mlâ€™s output deterministic.

macOSâ€™s libtool and ld64 also insist on writing timestamps into their outputs. You can set the environment variable ZERO_AR_DATE to 1 in a wrapper to make their output deterministic, but that confuses lldb of older Xcode versions.

Gcc sometimes uses random numbers in certain symbol mangling situations. Clang does not do this, so thereâ€™s no need to pass -frandom-seed to clang.

Itâ€™s a good idea to make your build independent of environment variables as much as possible, so that accidental local changes in the environment donâ€™t affect the build output. You should pass /X to clang-cl to make it ignore %INCLUDE% and explicitly pass system include directories via the -imsvc switch instead. Likewise, very new lld-link versions (LLVM 10 and newer, at the time of this writing still unreleased) understand the flag /lldignoreenv flag, which makes lld-link ignore the %LIB% environment variable; explicitly pass system library directories via /libpath:.

Footnote on embedding git hashes into the binary
It might be tempting to embed the git commit hash or svn revision that a binary was built at into the binaryâ€™s --version output, or use the revision as a cache key to invalidate on-disk caches when the version changes.

This doesnâ€™t affect your buildâ€™s determinism, but it does affect the hit rate if youâ€™re using deterministic builds to cache test run results. If your binary embeds the current commit, it is guaranteed to change on every single commit, and you wonâ€™t be able to cache test results across commits. Even commits that just fix typos in comments, add non-code documentation, or that only affect code used by some but not all of your binaries will change every binary.

For cache invalidation, consider using something finer-grained, such as only the latest commit of the directory containing the cache handling code, or the hash of all source files containing the cache handling code.

For --version output, if your build is fully deterministic, the hash of the binary itself (and its dynamic library dependencies) can serve as a stable version identifier. You can keep a map of binary hash to all commit hashes that produce that binary somewhere.

Windows only: For the same reason, just using the timestamp of the latest commit as a /timestamp: might not be the best option. Rounding the timestamp of the latest commit to 6h (or similar) granularity is a possible approach for not having the timestamp change the binary on every commit, while still keeping the timestamp close to reality. For production builds, the symbol server key for binaries is a (executable size, timestamp) pair, so here having fairly granular timestamps is important to not map binaries from consecutive commits to the same symbol server key. Depending on how often you push production binaries to your symbol servers, you might want to use the timestamp of the latest commit as /timestamp: for official builds, or you might want to round to finer granularity than you do on dev builds.

Getting to incremental determinism

Having deterministic incremental builds mostly requires having correct incremental builds, meaning that if a file is changed and the build reruns, everything that uses this file needs to be rebuilt.

This is very build system dependent, so this post canâ€™t say much about it.

In general, every build step needs to correctly declare all the inputs it depends on.

Some tools, such as Visual Studioâ€™s link.exe in /incremental mode, by design write a different output every time. Donâ€™t use inherently incrementally non-deterministic tools like that if you care about build determinism.

The build should not depend on environment variables, since build systems usually donâ€™t model dependencies on environment variables.

Getting to local determinism

Making build outputs independent of the names of the checkout or build directory means that build outputs must not contain absolute paths, or relative paths that contain the name of either directory.

A possible way to arrange for that is to put all build directories into the checkout directory. For example, if your code is at path/to/src, then you could have â€œoutâ€ in your .gitignore and build directories at path/to/src/out/debug, path/to/src/out/release, and so on. The relative path from each build artifact to the source is with â€œ../../â€ followed by the path of the source file in the source directory, which is identical for each build directory.

The C standard defines the predefined macro __FILE__ that expands to the name of the current source file. Clang expands this to an absolute path if it is invoked with an absolute path (`clang -c /absolute/path/to/my/file.cc`), and to a relative path if it is invoked with a relative path (`clang ../../path/to/my/file.cc`). To make your build locally deterministic, pass relative paths to your .cc files to clang.

By default, clang will internally use absolute paths to refer to compiler-internal headers. Pass -no-canonical-prefixes to make clang use relative paths for these internal files.

Passing relative paths to clang makes clang expand __FILE__ to a relative path, but paths in debug information are still absolute by default. Pass -fdebug-compilation-dir . to make paths in debug information relative to the build directory. (Before LLVM 9, this is an internal clang flag that must be used as `-Xclang -fdebug-compilation-dir -Xclang .`) When using clangâ€™s integrated assembler (the default), -Wa,-fdebug-compilation-dir,. will do the same for object files created from assembly input. (For ml.exe / ml64.exe, see the script linked to from the â€œBasic determinismâ€ section above.)

Using this means that debuggers wonâ€™t automatically find the source code belonging to your binary. At the moment, thereâ€™s no way to tell debuggers to resolve relative paths relative to the location of the binary (DWARF proposal, gdb patch). See the end of this section for how to configure common debuggers to work correctly.

There are a few flags that try to make compilers produce relative paths in outputs even if the filename passed to the compiler is absolute (-fdebug-prefix-map, -ffile-prefix-map, -fmacro-prefix-map). Do not use these flags.

They work by adding lhs=rhs replacement patterns, and the lhs must be an absolute path to remove the absolute path from the output. That means that while they make the compile output path-independent, they make the compile command itself path-dependent, which hinders distributed compile caching. With -grecord-gcc-switches or -frecord-gcc-switches the compile command is embedded in debug info or even the object file itself, so in that case the flags even break local determinism. (Both -grecord-gcc-switches and -frecord-gcc-switches default to false in clang.)
They donâ€™t affect the paths in dwo files when using fission; passing relative paths to the compiler is the only way to make these paths relative.

On Windows, itâ€™s very unusual to have PDBs with relative paths. You can pass /pdbsourcepath:X:\fake\prefix to lld-link to make it resolve all relative paths in object files against a fixed absolute path to make sure your final PDBs contain absolute paths. Since the absolute path is against a fixed prefix, this doesnâ€™t impair determinism. With this, both binaries and PDBs created by clang-cl and lld-link will be fully deterministic and build path independent.

Also on Windows, the linker by default puts the absolute path the to the generated PDB file in the output binary. Pass /pdbaltpath:%_PDB% when you pass /debug to make the linker write a relative path to the generated PDB file instead. If you have custom build steps that extract PDB names from binaries, you have to make sure these scripts work with relative paths. Microsoftâ€™s tools (debuggers, ETW) work fine with this set in most situations, and you can add a symbol search path in the cases where they donâ€™t (when the binaries are copied before being run).

Getting debuggers to work well with locally deterministic builds
At the moment, no debugger offers an option to resolve relative paths in debug info against the directory the debugged binary is in.

Some debuggers (gdb, lldb) do try to resolve relative paths against the cwd, so a simple way to make debugging work is to cd into your build directory before debugging.

If you donâ€™t want to require devs to cd into the build directory for debugging to work, you have to do debugger-specific configuration tweaks.

To make sure devs donâ€™t miss this, you could have your custom init script set an env var and query if itâ€™s set early during your test binary startup, and exit with a message like â€œAdd `source /path/to/your/project/gdbinit` to your ~/.gdbinitâ€ if the environment variable isnâ€™t set.

gdb
`dir path/to/build/dir` tells gdb what directory to resolve relative paths against.

`show debug-file-directory` prints the list of directories gdb looks in for dwo files. Query that, append `:path/to/build/dir`, and call `set debug-file-directory` to add your build dir to that search path.

For an example, see Chromiumâ€™s gdbinit (which also does a few other unrelated things).

lldb
`settings set target.source-map ../.. /absolute/path/to/build/dir` can map the â€œ../..â€ prefix that all .cc files will refer to when using the setup described above with an absolute path. This requires Xcode 10.3 or newer; the lldb shipping with Xcode 10.1 has problems with this setup.

For an example, see Chromiumâ€™s lldbinit.

Visual Studioâ€™s debugger and windbg
If you use the setup described above, /PDBSourcePath:X:\fake\prefix will combine with the â€œ..\..\my\file.ccâ€ relative paths to make your code appear at â€œX:\my\file.ccâ€. To make Windows debuggers find them, you have two options:

Run `subst X: C:\src\real\root` in cmd.exe before launching the debuggers to create a virtual drive that maps X: to the actual source location. Both windbg and Visual Studio will load code over X: this way.
Add â€œC:\src\real\rootâ€ to each debuggerâ€™s source search path.

Windbg: Run `.srcpath+ C:\src\real\root`. You can also set this via the _NT_SOURCE_PATH environment variable, or via File->Source File Path (Ctrl+P). Or pass `-srcpath C:\src\real\root` when launching windbg from the command line.
Visual Studio: The IDE has a â€œDebug Source Filesâ€ property. Add C:\src\real\root to â€œDirectories containing source codeâ€ to Project->Properties (Alt+F7)->Common Properties->Debug Source Files->Directories containing source code.

Alternatively, you could pass the absolute path to the actual build directory to /PDBSourcePath: instead of something like â€œX:\fake\prefixâ€. That way, all PDBs have â€œcorrectâ€ absolute paths in them, while your compile steps are still path-independent and can share a cache across machines. However, since executables contain a reference to the PDB hash, none of your binaries will be path-independent. This setup doesnâ€™t require any debugger configuration, but it doesnâ€™t allow your builds to be locally deterministic.

Getting to universal determinism

By now, your build output is deterministic as long as everyone uses the same compiler, and linker binaries, and as long as everyone uses the version of the SDK and system libraries.

Making your build independent of that requires making sure that everyone automatically uses the same compiler, linker, and SDK.

This might seem like a lot of work, but in addition to build determinism this work also gives you cross builds (where you can e.g. build the Linux version of your product on a Windows host).

It also versions the compiler, linker, and SDK used within your code, which means you will be able to update all your bots and devs to new versions automatically (and if an update causes issues, itâ€™s easy to revert it).

You need to store the currently-used compiler, linker, and SDK versions in a file in your source control repository, and from some kind of hook that runs after pulling the newest version of the source, download compiler, linker, and SDK of the right version from some kind of cloud storage service.

You then need to modify your build files to use --sysroot (Linux), -isysroot (macOS), -imsvc (Windows) to use these hermetic SDKs for builds. They need to be somewhere below your source root to not regress build directory name invariance.

You also want to make sure your build doesnâ€™t depend on environment variables, as already mentioned in the â€œGetting to incremental determinismâ€, since environments between different machines can be very different and difficult to control.

Build steps shouldnâ€™t embed the hostname of the current machine or the logged-in user name in the build output, or similar.

Summary

This post explained what deterministic builds are, how build determinism spans a spectrum (local, fixed-build-dir-path-only to fully host-OS-independent) instead of just being binary, and how you can use LLVMâ€™s tools to make your build deterministic. It also touched on techniques you can use to make your test caches more effective.

Thanks to Elly Fong-Jones for helping edit and structure this post, and to Adrian McCarthy, Bob Haarman, Bruce Dawson, Dirk Pranke, Fumitoshi Ukai, Hans Wennborg, Kai Naschinski, Reid Kleckner, Rui Ueyama, and Takuto Ikuta for reading drafts and suggesting improvements.

Closing the gap: cross-language LTO between Rust and C/C++

Thu, 19 Sep 2019 05:15:00 +0000

Link time optimization (LTO) is LLVM's way of implementing whole-program optimization. Cross-language LTO is a new feature in the Rust compiler that enables LLVM's link time optimization to be performed across a mixed C/C++/Rust codebase. It is also a feature that beautifully combines two respective strengths of the Rust programming language and the LLVM compiler platform:

Rust, with its lack of a language runtime and its low-level reach, has an almost unique ability to seamlessly integrate with an existing C/C++ codebase, and
LLVM, as a language agnostic foundation, provides a common ground where the source language a particular piece of code was written in does not matter anymore.

So, what does cross-language LTO do? There are two answers to that:

From a technical perspective it allows for codebases to be optimized without regard for implementation language boundaries, making it possible for important optimizations, such as function inlining, to be performed across individual compilation units even if, for example, one of the compilation units is written in Rust while the other is written in C++.
From a psychological perspective, which arguably is just as important, it helps to alleviate the nagging feeling of inefficiency that many performance conscious developers might have when working on a piece of software that jumps back and forth a lot between functions implemented in different source languages.

Because Firefox is a large, performance sensitive codebase with substantial parts written in Rust, cross-language LTO has been a long-time favorite wish list item among Firefox developers. As a consequence, we at Mozilla's Low Level Tools team took it upon ourselves to implement it in the Rust compiler.

To explain how cross-language LTO works it is useful to take a step back and review how traditional compilation and "regular" link time optimization work in the LLVM world.

Background - A bird's eye view of the LLVM compilation pipeline

Clang and the Rust compiler both follow a similar compilation workflow which, to some degree, is prescribed by LLVM:

The compiler front-end generates an LLVM bitcode module (.bc) for each compilation unit. In C and C++ each source file will result in a single compilation unit. In Rust each crate is translated into at least one compilation unit.
```
 .c --clang--> .bc

 .c --clang--> .bc


 .rs --+
 |
 .rs --+--rustc--> .bc
 |
 .rs --+
```

In the next step, LLVM's optimization pipeline will optimize each LLVM module in isolation:


 .c --clang--> .bc --LLVM--> .bc (opt)

 .c --clang--> .bc --LLVM--> .bc (opt)


 .rs --+
 |
 .rs --+--rustc--> .bc --LLVM--> .bc (opt)
 |
 .rs --+

LLVM then lowers each module into machine code so that we get one object file per module:


 .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o

 .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o


 .rs --+
 |
 .rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o
 |
 .rs --+

Finally, the linker will take the set of object files and link them together into a binary:


 .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o ------+
 |
 .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o ------+
 |
 +--ld--> bin
 .rs --+ |
 | |
 .rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o --+
 |
 .rs --+

This is the regular compilation workflow if no kind of LTO is involved. As you can see, each compilation unit is optimized in isolation. The optimizer does not know the definition of functions inside of other compilation units and thus cannot inline them or make other kinds of decisions based on what they actually do. To enable inlining and optimizations to happen across compilation unit boundaries, LLVM supports link time optimization.

Link time optimization in LLVM

The basic principle behind LTO is that some of LLVM's optimization passes are pushed back to the linking stage. Why the linking stage? Because that is the point in the pipeline where the entire program (i.e. the whole set of compilation units) is available at once and thus optimizations across compilation unit boundaries become possible. Performing LLVM work at the linking stage is facilitated via a plugin to the linker.

Here is how LTO is concretely implemented:

the compiler translates each compilation unit into LLVM bitcode (i.e. it skips lowering to machine code),
the linker, via the LLVM linker plugin, knows how to read LLVM bitcode modules like regular object files, and
the linker, again via the LLVM linker plugin, merges all bitcode modules it encounters and then runs LLVM optimization passes before doing the actual linking.

With these capabilities in place a new compilation workflow with LTO enabled for C++ code looks like this:


 .c --clang--> .bc --LLVM--> .bc (opt) ------------------+ - - +
 | |
 .c --clang--> .bc --LLVM--> .bc (opt) ------------------+ - - +
 | |
 +-ld+LLVM--> bin
 .rs --+ |
 | |
 .rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o -+
 |
 .rs --+

As you can see our Rust code is still compiled to a regular object file. Therefore, the Rust code is opaque to the optimization taking place at link time. Yet, looking at the diagram it seems like that shouldn't be too hard to change, right?

Cross-language link time optimization

Implementing cross-language LTO is conceptually simple because the feature is built on the shoulders of giants. Since the Rust compiler uses LLVM all the important building blocks are readily available. The final diagram looks very much as you would expect, with rustc emitting optimized LLVM bitcode and the LLVM linker plugin incorporating that into the LTO process with the rest of the modules:


 .c --clang--> .bc --LLVM--> .bc (opt) ---------+
 |
 .c --clang--> .bc --LLVM--> .bc (opt) ---------+
 |
 +-ld+LLVM--> bin
 .rs --+ |
 | |
 .rs --+--rustc--> .bc --LLVM--> .bc (opt) -----+
 |
 .rs --+

Nonetheless, achieving a production-ready implementation still turned out to be a significant time investment. After figuring out how everything fits together, the main challenge was to get the Rust compiler to produce LLVM bitcode that was compatible with both the bitcode that Clang produces and with what the linker plugin would accept. Some of the issues we ran into where:

The Rust compiler and Clang are both based on LLVM but they might be using different versions of LLVM. This was further complicated by the fact that Rust's LLVM version often does not match a specific LLVM release, but can be an arbitrary revision from LLVM's repository. We learned that all LLVM versions involved really have to be a close match in order for things to work out. The Rust compiler's documentation now offers a compatibility table for the various versions of Rust and Clang.
The Rust compiler by default performs a special form of LTO, called ThinLTO, on all compilation units of the same crate before passing them on to the linker. We quickly learned, however, that the LLVM linker plugin crashes with a segmentation fault when trying to perform another round of ThinLTO on a module that had already gone through the process. No problem, we thought and instructed the Rust compiler to disable its own ThinLTO pass when compiling for the cross-language case and indeed everything was fine -- until the segmentation faults mysteriously returned a few weeks later even though ThinLTO was still disabled.

We noticed that the problem only occurred in a specific, presumably innocent setting: again two passes of LTO needed to happen, this time the first was a regular LTO pass within rustc and the output of that would then be fed into ThinLTO within the linker plugin. This setup, although computationally expensive, was desirable because it produced faster code and allowed for better dead-code elimination on the Rust side. And in theory it should have worked just fine. Yet somehow rustc produced symbol names that had apparently gone through ThinLTO's mangling even though we checked time and again that ThinLTO was disabled for Rust. We were beginning to seriously question our understanding of LLVM's inner workings as the problem persisted while we slowly ran out of ideas on how to debug this further.

You can picture the proverbial lightbulb appearing over our heads when we figured out that Rust's pre-compiled standard library would still have ThinLTO enabled, no matter the compiler settings we were using for our tests. The standard library, including its LLVM bitcode representation, is compiled as part of Rust's binary distribution so it is always compiled with the settings from Rust's build servers. Our local full LTO pass within rustc would then pull this troublesome bitcode into the output module which in turn would make the linker plugin crash again. Since then ThinLTO is turned off for libstd by default.
After the above fixes, we succeeded in compiling the entirety of Firefox with cross-language LTO enabled. Unfortunately, we discovered that no actual cross-language optimizations were happening. Both Clang and rustc were producing LLVM bitcode and LLD produced functioning Firefox binaries, but when looking at the machine code, not even trivial functions were being inlined across language boundaries. After days of debugging (and unfortunately without being aware of LLVM's optimization remarks at the time) it turned out that Clang was emitting a target-cpu attribute on all functions while rustc didn't, which made LLVM reject inlining opportunities.

In order to prevent the feature from silently regressing for similar reasons in the future we put quite a bit of effort into extending the Rust compiler's testing framework and CI. It is now able to compile and run a compatible version of Clang and uses that to perform end-to-end tests of cross-language LTO, making sure that small functions will indeed get inlined across language boundaries.

This list could still go on for a while, with each additional target platform holding new surprises to be dealt with. We had to progress carefully by putting in regression tests at every step in order to keep the many moving parts in check. At this point, however, we feel confident in the underlying implementation, with Firefox providing a large, complex, multi-platform test case where things have been working well for several months now.

Using cross-language LTO: a minimal example

The exact build tool invocations differ depending on whether it is rustc or Clang performing the final linking step, and whether Rust code is compiled via Cargo or via rustc directly. Rust's compiler documentation describes the various cases. The simplest of them, where rustc directly produces a static library and Clang does the linking, looks as follows:


 # Compile the Rust static library, called "xyz"
 rustc --crate-type=staticlib -O -C linker-plugin-lto -o libxyz.a lib.rs

 # Compile the C code with "-flto"
 clang -flto -c -O2 main.c

 # Link everything
 clang -flto -O2 main.o -L . -lxyz

The -C linker-plugin-lto option instructs the Rust compiler to emit LLVM bitcode which then can be used for both "full" and "thin" LTO. Getting things set up for the first time can be quite cumbersome because, as already mentioned, all compilers and the linker involved must be compatible versions. In theory, most major linkers will work; in practice LLD seems to be the most reliable one on Linux, with Gold in second place and the BFD linker needing to be at least version 2.32. On Windows and macOS the only linkers properly tested are LLD and ld64 respectively. For ld64 Firefox uses a patched version because the LLVM bitcode that rustc produces likes to trigger a pre-existing issue this linker has with ThinLTO.

Conclusion

Cross-language LTO has been enabled for Firefox release builds on Windows, macOS, and Linux for several months at this point and we at Mozilla's Low Level Tools team are pleased with how it turned out. While we still need to work on making the initial setup of the feature easier, it already enabled removing duplicated logic from Rust components in Firefox because now code can simply call into the equivalent C++ implementation and rely on those calls to be inlined. Having cross-language LTO in place and continuously tested will definitely lower the psychological bar for implementing new components in Rust, even if they are tightly integrated with existing C++ code.

Cross-language LTO is available in the Rust compiler since version 1.34 and works together with Clang 8. Feel free to give it a try and report any problems in the Rust bug tracker.

Acknowledgments

I'd like to thank my Low Level Tools team colleagues David Major, Eric Rahm, and Nathan Froyd for their invaluable help and encouragement, and I'd like to thank Alex Crichton for his tireless reviews on the Rust side.

Announcing the program for the 2019 LLVM Developers' Meeting - Bay Area

Wed, 04 Sep 2019 07:16:00 +0000

Announcing the program for the 2019 LLVM Developers' Meeting in San Jose, CA! This program is the largest we have ever had and has over 11 tutorials, 29 technical talks, 24 lightning talks, 2 panels, 3 birds of a feather, 14 posters, and 4 SRC talks. Be sure to register to attend this event and hear some of these great talks.

Keynotes

Generating Optimized Code with GlobalISel - Volkan Keles, Daniel Sanders
Even Better C++ Performance and Productivity: Enhancing Clang to Support Just-in-Time Compilation of Templates- Hal Finkel

Technical Talks

Using LLVM's portable SIMD with Zig - Shawn Landden
Code-Generation for the Arm M-profile Vector Extension - Sjoerd Meijer
Alive2: Verifying Existing Optimizations - Nuno Lopes
The clang constexpr interpreter - Nandor Licker
Souper-Charging Peepholes with Target Machine Info - Min-Yih Hsu
Transitioning the Networking Software Toolchain to Clang/LLVM - Ivan Baev, Jeremy Stenglein, Bharathi Seshadri
Link Time Optimization For Swift - Jin Lin
Hot Cold Splitting Optimization Pass In LLVM - Aditya Kumar
Making UB hurt less: security mitigations through automatic variable initialization - JF Bastien
Propeller: Profile Guided Large Scale Performance Enhancing Relinker - Sriraman Tallam
From C++ for OpenCL to C++ for accelerator devices - Anastasia Stulova
LLVM-Canon: Shooting for Clear Diffs - Michal Paszkowski
Better C++ debugging using Clang Modules in LLDB - Raphael Isemann
Ownership SSA and Semantic SIL - Michael Gottesman
arm64e: An ABI for Pointer Authentication - Ahmed Bougacha, John McCall
Porting by a 1000 Patches: Bringing Swift to Windows - Saleem Abdulrasool
The Penultimate Challange: Constructing bug reports in the Clang Static Analyzer - KristÃƒÂ³f Umann
Address Spaces in LLVM - Matt Arsenault
An MLIR Dialect for High-Level Optimization of Fortran - Eric Schweitz
Loop-transformation #pragmas in the front-end - Michael Kruse
Optimizing builds on Windows: some practical considerations - Alexandre Ganea
LLVM-Reduce for testcase reduction - Diego TreviÃƒÂ±o Ferrer
Memoro: Scaling an LLVM-based Heap profiler - Thierry Treyer
The Attributor: A Versatile Inter-procedural Fixpoint Iteration Framework - Johannes Doerfert
LLVM Tutorials: How to write Beginner-Friendly, Inclusive Tutorials - Meike BaumgÃƒÂ¤rtner
Maturing an LLVM backend: Lessons learned from the RISC-V target - Alex Bradbury

Tutorials

Getting Started With LLVM: Basics - Jessica Paquette, Florian Hahn
ASTImporter: Merging Clang ASTs - GÃƒÂ¡bor MÃƒÂ¡rton
Developing the Clang Static Analyzer - Artem Dergachev
Writing an LLVM Pass: 101 - Andrzej Warzynski
Writing Loop Optimizations in LLVM - Kit Barton, Ettore Tiotto, Hal Finkel, Michael Kruse, Johannes Doerfert
The Attributor: A Versatile Inter-procedural Fixpoint Iteration Framework - Johannes Doerfert
Getting Started with the LLVM Testing Infrastructure - Brian Homerding, Michael Kruse
An overview of Clang - Sven Van Haastregt, Anastasia Stulova
An overview of LLVM - Eric Christopher, Sanjoy Das, Johannes Doerfert
How to Contribute to LLVM - Chris Bieneman, Kit Barton
My First Clang Warning - Dmitri Gribenko, Meike Baumgartner

Student Research Competition

Cross-Translation Unit Optimization via Annotated Headers - William S. Moses
Quantifying Dataflow Analysis with Gradients in LLVM - Abhishek Shah
Floating Point Consistency in the Wild: A practical evaluation of how compiler optimizations affect high performance floating point code - Jack J Garzella
Static Analysis of OpenMP Data Mapping for Target Offloading - Prithayan Barua

Panels

Panel: Inter-procedural Optimization (IPO) - Teresa Johnson, Philip Reames, Chandler Carruth, Johannes Doerfert
The Loop Optimization Working Group - Kit Barton, Michael Kruse, TBD

Birds of a Feather

LLDB - Jonas Devlieghere
Towards Better Code Generator Design and Unification for a Stack Machine - Leonid Kholodov, Dmitry Borisenkov
Debug Info - Adrian Prantl

Lightning Talks

GWP-ASan: Zero-Cost Detection of MEmory Safety Bugs in Production - Matt Morehouse
When 3 Memory Models ArenÃ¢â‚¬â„¢t Enough Ã¢â‚¬â€œ OpenVMS on x86 - John Reagan
FileCheck: learning arithmetic - Thomas Preud'homme
-Wall Found Programming Errors and Engineering Effort to Enable Across a Large Codebase - Aditya Kumar
Handling 1000s of OpenCL builtin functions in Clang - Sven van Haastregt
NEC SX-Aurora as a Scalable Vector Playground - Kazuhisa Ishizaka
Implementing Machine Code Optimizations for RISC-V - Lewis Revill
Optimization Remarks Update - Francis Visoiu Mistrih
Supporting Regular and Thin LTO with a Single LTO Bitcode Format - Matthew Voss
Transitioning AppleÃ¢â‚¬â„¢s Downstream llvm-project Repositories to the Monorepo - Alex Lorenz
A Unified Debug Server For Deeply Embedded Systems and LLDB - Simon Cook
State of LLDB and Deeply Embedded RISC-V - Simon Cook
Supporting a Vendor ABI Variant in Clang - Paul Robinson
Speculative Compilation in ORC JIT - Praveen Velliengiri
Optimization Remarks for Human Beings - William Bundy
Improving the Optimized Debugging Experience - Orlando Cazalet-Hyams
Improving your TableGen Descriptions - Javed Absar
Loom: Weaving Instrumentation for Program Analysis - Brian Kidney
Clang Interface Stubs: Syntax Directed Stub Library Generation. - Puyan Lotfi
Flang Update - Steve Scalpone
Lowering tale: Supporting 64 bit pointers in RISCV 32 bit LLVM backend - Reshabh Sharma
Virtual Function Elimination in LLVM - Oliver Stannard
Making a Language Cross Platform: Libraries and Tooling - Gwen Mittertreiner
Grafter - A use case to implement an embedded DSL in C++ and perform source to source traversal fusion transformation using Clang - Laith Sakka

Posters

TON Labs Backend for TON Blockchain - Dmitry Borisenkov, Dmitry Shtukenberg, Leonid Kholodov
LLVM Build Times Using a Program Repository - Rusell Gallop, Phil Camp
RISC-V Bit Manipulation Support in the Clang/LLVM Toolchain - Scott Egerton, Paolo Savini
Attributor, a Framework for Interprocedural Information Deduction - Johannes Doerfert, Hideto Ueno, Stefan Stipanovic
Overflows Be Gone: Checked C for Memory Safety - Mandeep Singh Grang
Cross-Translation Unit Optimization via Annotated Headers - William S. Moses
Quantifying Dataflow Analysis with Gradients in LLVM - Abhishek Shah
Floating Point Consistency in the Wild: A practical evaluation of how compiler optimizations affect high performance floating point code - Jack J Garzella
Static Analysis of OpenMP Data Mapping for Target Offloading - Prithayan Barua
NEC SX-Aurora as a Scalable Vector Playground - Kazuhisa Ishizaka
A Unified Debug Server For Deeply Embedded Systems and LLDB - Simon Cook
Speculative Compilation in ORC JIT - Praveen Velliengiri
Loom: Weaving Instrumentation for Program Analysis - Brian Kidney
Lowering tale: Supporting 64 bit pointers in RISCV 32 bit LLVM backend - Reshabh Sharma

The LLVM Project is Moving to GitHub

Thu, 01 Aug 2019 16:17:00 +0000

The LLVM Project is Moving to GitHub

After several years of discussion and planning, the LLVM project is getting ready to complete the migration of its source code from SVN to GitHub! At last yearâ€™s developer meeting, many interested community members convened at a series of round tables to lay out a plan to completely migrate LLVM source code from SVN to GitHub by the 2019 U.S. Developerâ€™s Meeting. We have made great progress over the last nine months and are on track to complete the migration on October 21, 2019.

As part of the migration to GitHub we are maintaining the â€˜monorepoâ€™ layout which currently exists in SVN. This means that there will be a single git repository with one top-level directory for each LLVM sub-project. This will be a change for those of you who are already using git and accessing the code via the official sub-project git mirrors (e.g. https://git.llvm.org/git/llvm.git) where each sub-project has its own repository.

One of the first questions people ask when they hear about the GitHub plans is: Will the project start using GitHub pull requests and issues? And the answer to that for now is: no. The current transition plan focuses on migrating only the source code. We will continue to use Phabricator for code reviews, and bugzilla for issue tracking after the migration is complete. We have not ruled out using pull requests and issues at some point in the future, but these are discussions we still need to have as a community.

The most important takeaway from this post, though, is that if you consume the LLVM source code in any way, you need to take action now to migrate your workflows. If you manage any continuous integration or other systems that need read-only access to the LLVM source code, you should begin pulling from the official GitHub repository instead of SVN or the current sub-project mirrors. If you are a developer that needs to commit code, please use the git-llvm script for committing changes.

We have created a status page, if you want to track the current progress of the migration. We will be posting updates to this page as we get closer to the completion date. If you run into issues of any kind with GitHub you can file a bug in bugzilla and mark it as a blocker of the github tracking bug.

This entire process has been a large community effort. Many many people have put in time discussing, planning, and implementing all the steps required to make this happen. Thank you to everyone who has been involved and letâ€™s keep working to make this migration a success.

Blog post by Tom Stellard.

LLVM and Google Season of Docs

Fri, 24 May 2019 11:09:00 +0000

The LLVM Project is pleased to announce that we have been selected to participate in Googleâ€™s Season of Docs!

Our project idea list may be found here:

http://llvm.org/SeasonOfDocs.html

From now until May 29th, technical writers are encouraged to review the proposed project ideas and to ask any questions you have on our [email protected] mailing list. Other documentation ideas are allowed, but we can not guarantee that a mentor will be found for the project. You are encouraged to discuss new ideas on the mailing list prior to submitting your technical writer application, in order to start the process of finding a mentor.

When submitting your application for an LLVM documentation project, please consider the following:

Include Prior Experience: Do you have prior technical writing experience? We want to see this! Considering including links to prior documentation or attachments of documentation you have written. If you canâ€™t include a link to the actual documentation, please describe in detail what you wrote, who the audience was, and any other important information that can help us gauge your prior experience. Please also include any experience with Sphinx or other documentation generation tools.
Take your time writing the proposal: We will be looking closely at your application to see how well it is written. Take the time to proofread and know who your audience is.
Propose your plan for our documentation project: We have given a rough idea of what changes or topics we envision for the documentation, but this is just a start. We expect you to take the idea and expand or modify it as you see fit. Review our existing documentation and see how it would compliment or replace other pieces. Optionally include an overview or document design or layout plan in your application.
Become familiar with our project: We donâ€™t expect you to become a compiler expert, but we do expect you read up on our project to learn a bit about LLVM.

We look forward to working with some fabulous technical writers and improving our documentation. Again, please email [email protected] with your questions.

LLVM Numerics Blog

Fri, 15 Mar 2019 15:47:00 +0000

Keywords: Numerics, Clang, LLVM-IR, : 2019 LLVM Developers' Meeting, LLVMDevMtg.

The goal of this blog post is to start a discussion about numerics in LLVM â€“ where we are, recent work and things that remain to be done.  There will be an informal discussion on numerics at the 2019 EuroLLVM conference next month. One purpose of this blog post is to refresh everyone's memory on where we are on the topic of numerics to restart the discussion.

In the last year or two there has been a push to allow fine-grained decisions on which optimizations are legitimate for any given piece of IR.  In earlier days there were two main modes of operation: fast-math and precise-math.  When operating under the rules of precise-math, defined by IEEE-754, a significant number of potential optimizations on sequences of arithmetic instructions are not allowed because they could lead to violations of the standard.  

For example: 

The Reassociation optimization pass is generally not allowed under precise code generation as it can change the order of operations altering the creation of NaN and Inf values propagated at the expression level as well as altering precision.  

Precise code generation is often overly restrictive, so an alternative fast-math mode is commonly used where all possible optimizations are allowed, acknowledging that this impacts the precision of results and possibly IEEE compliant behavior as well.  In LLVM, this can be enabled by setting the unsafe-math flag at the module level, or passing the -funsafe-math-optimizations to clang which then sets flags on the IR it generates.  Within this context the compiler often generates shorter sequences of instructions to compute results, and depending on the context this may be acceptable.  Fast-math is often used in computations where loss of precision is acceptable.  For example when computing the color of a pixel, even relatively low precision is likely to far exceed the perception abilities of the eye, making shorter instruction sequences an attractive trade-off.  In long-running simulations of physical events however loss of precision can mean that the simulation drifts from reality making the trade-off unacceptable.

Several years ago LLVM IR instructions gained the ability of being annotated with flags that can drive optimizations with more granularity than an all-or-nothing decision at the module level.  The IR flags in question are: 

nnan, ninf, nsz, arcp, contract, afn, reassoc, nsw, nuw, exact.  

Their exact meaning is described in the LLVM Language Reference Manual.   When all the flags are are enabled, we get the current fast-math behavior.  When these flags are disabled, we get precise math behavior.  There are also several options available between these two models that may be attractive to some applications.  In the past year several members of the LLVM community worked on making IR optimizations passes aware of these flags.  When the unsafe-math module flag is not set these optimization passes will work by examining individual flags, allowing fine-grained selection of the optimizations that can be enabled on specific instruction sequences.  This allows vendors/implementors to mix fast and precise computations in the same module, aggressively optimizing some instruction sequences but not others.

We now have good coverage of IR passes in the LLVM codebase, in particular in the following areas:

* Intrinsic and libcall management

* Instruction Combining and Simplification

* Instruction definition

* SDNode definition

* GlobalIsel Combining and code generation

* Selection DAG code generation

* DAG Combining

* Machine Instruction definition

* IR Builders (SDNode, Instruction, MachineInstr)

* CSE tracking

* Reassociation

* Bitcode

There are still some areas that need to be reworked for modularity, including vendor specific back-end passes.  

The following are some of the contributions mentioned above from the last 2 years of open source development:

https://reviews.llvm.org/D45781 : MachineInst support mapping SDNode fast math flags for support in Back End code generation 

https://reviews.llvm.org/D46322 : [SelectionDAG] propagate 'afn' and 'reassoc' from IR fast-math-flags

https://reviews.llvm.org/D45710 : Fast Math Flag mapping into SDNode

https://reviews.llvm.org/D46854 : [DAG] propagate FMF for all FPMathOperators

https://reviews.llvm.org/D48180 : updating isNegatibleForFree and GetNegatedExpression with fmf for fadd

https://reviews.llvm.org/D48057: easing the constraint for isNegatibleForFree and GetNegatedExpression

https://reviews.llvm.org/D47954 : Utilize new SDNode flag functionality to expand current support for fdiv

https://reviews.llvm.org/D47918 : Utilize new SDNode flag functionality to expand current support for fma

https://reviews.llvm.org/D47909 : Utilize new SDNode flag functionality to expand current support for fadd

https://reviews.llvm.org/D47910 : Utilize new SDNode flag functionality to expand current support for fsub

https://reviews.llvm.org/D47911 : Utilize new SDNode flag functionality to expand current support for fmul

https://reviews.llvm.org/D48289 : refactor of visitFADD for AllowNewConst cases

https://reviews.llvm.org/D47388 : propagate fast math flags via IR on fma and sub expressions

https://reviews.llvm.org/D47389 : guard fneg with fmf sub flags

https://reviews.llvm.org/D47026 : fold FP binops with undef operands to NaN

https://reviews.llvm.org/D47749 : guard fsqrt with fmf sub flags

https://reviews.llvm.org/D46447 : Mapping SDNode flags to MachineInstr flags

rL334970: [NFC] make MIFlag accessor functions consistant with usage model

rL338604: [NFC] small addendum to r334242, FMF propagation

https://reviews.llvm.org/D50195 : extend folding fsub/fadd to fneg for FMF

https://reviews.llvm.org/rL339197 : [NFC] adding tests for Y - (X + Y) --> -X

https://reviews.llvm.org/D50417 : [InstCombine] fold fneg into constant operand of fmul/fdiv

https://reviews.llvm.org/rL339357 : extend folding fsub/fadd to fneg for FMF

https://reviews.llvm.org/D50996 : extend binop folds for selects to include true and false binops flag intersection

https://reviews.llvm.org/rL339938 : add a missed case for binary op FMF propagation under select folds

https://reviews.llvm.org/D51145 : Guard FMF context by excluding some FP operators from FPMathOperator

https://reviews.llvm.org/rL341138 : adding initial intersect test for Node to Instruction association

https://reviews.llvm.org/rL341565 : in preparation for adding nsw, nuw and exact as flags to MI

https://reviews.llvm.org/D51738 : add IR flags to MI

https://reviews.llvm.org/D52006 : Copy utilities updated and added for MI flags

https://reviews.llvm.org/rL342598 : add new flags to a DebugInfo lit test

https://reviews.llvm.org/D53874 : [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative

https://reviews.llvm.org/D55668 : Add FMF management to common fp intrinsics in GlobalIsel

https://reviews.llvm.org/rL352396 : [NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax targetâ€¦

https://reviews.llvm.org/rL316753 (Fold fma (fneg x), K, y -> fma x, -K, y)

https://reviews.llvm.org/D57630 : Move IR flag handling directly into builder calls for cases translated from Instructions in GlobalIsel

https://reviews.llvm.org/rL332756 : adding baseline fp fold tests for unsafe on and off

https://reviews.llvm.org/rL334035 : NFC: adding baseline fneg case for fmf

https://reviews.llvm.org/rL325832 : [InstrTypes] add frem and fneg with FMF creators

https://reviews.llvm.org/D41342 : [InstCombine] Missed optimization in math expression: simplify calls exp functions

https://reviews.llvm.org/D52087 : [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.

https://reviews.llvm.org/D52075 : [InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y))

https://reviews.llvm.org/rL338059 : [InstCombine] fold udiv with common factor from muls with nuw

Commit: e0ab896a84be9e7beb59874b30f3ac51ba14d025 : [InstCombine] allow more fmul folds with â€˜reassoc'

Commit: 3e5c120fbac7bdd4b0ff0a3252344ce66d5633f9 : [InstCombine] distribute fmul over fadd/fsub

https://reviews.llvm.org/D37427 : [InstCombine] canonicalize fcmp ord/uno with constants to null constant

https://reviews.llvm.org/D40130 : [InstSimplify] fold and/or of fcmp ord/uno when operand is known nnan

https://reviews.llvm.org/D40150 : [LibCallSimplifier] fix pow(x, 0.5) -> sqrt() transforms

https://reviews.llvm.org/D39642 : [ValueTracking] readnone is a requirement for converting sqrt to llvm.sqrt; nnan is not

https://reviews.llvm.org/D39304 : [IR] redefine 'reassoc' fast-math-flag and add 'trans' fast-math-flag

https://reviews.llvm.org/D41333 : [ValueTracking] ignore FP signed-zero when detecting a casted-to-integer fmin/fmax pattern

https://reviews.llvm.org/D5584 : Optimize square root squared (PR21126)

https://reviews.llvm.org/D42385 : [InstSimplify] (X * Y) / Y --> X for relaxed floating-point ops

https://reviews.llvm.org/D43160 : [InstSimplify] allow exp/log simplifications with only 'reassocâ€™ FMF

https://reviews.llvm.org/D43398 : [InstCombine] allow fdiv folds with less than fully 'fastâ€™ ops

https://reviews.llvm.org/D44308 : [ConstantFold] fp_binop AnyConstant, undef --> NaN

https://reviews.llvm.org/D43765 : [InstSimplify] loosen FMF for sqrt(X) * sqrt(X) --> X

https://reviews.llvm.org/D44521 : [InstSimplify] fp_binop X, NaN --> NaN

https://reviews.llvm.org/D47202 : [CodeGen] use nsw negation for abs

https://reviews.llvm.org/D48085 : [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros

https://reviews.llvm.org/D48401 : [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)

https://reviews.llvm.org/D39669 : DAG: Preserve nuw when reassociating adds

https://reviews.llvm.org/D39417 : InstCombine: Preserve nuw when reassociating nuw ops

https://reviews.llvm.org/D51753 : [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)

https://reviews.llvm.org/D51630 : [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))

https://reviews.llvm.org/D53650 : [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes

https://reviews.llvm.org/D54001 : [ValueTracking] determine sign of 0.0 from select when matching min/max FP

https://reviews.llvm.org/D51942 : [InstCombine] Fold (C/x)>0 into x>0 if possible

https://llvm.org/svn/llvm-project/llvm/trunk@348016 : [SelectionDAG] fold FP binops with 2 undef operands to undef

http://llvm.org/viewvc/llvm-project?view=revision&revision=346242 : propagate fast-math-flags when folding fcmp+fpext, part 2

http://llvm.org/viewvc/llvm-project?view=revision&revision=346240 : propagate fast-math-flags when folding fcmp+fpext

http://llvm.org/viewvc/llvm-project?view=revision&revision=346238 : [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2

http://llvm.org/viewvc/llvm-project?view=revision&revision=346169 : [InstSimplify] fold select (fcmp X, Y), X, Y

http://llvm.org/viewvc/llvm-project?view=revision&revision=346234 : propagate fast-math-flags when folding fcmp+fneg

http://llvm.org/viewvc/llvm-project?view=revision&revision=346147 : [InstCombine] canonicalize -0.0 to +0.0 in fcmp

http://llvm.org/viewvc/llvm-project?view=revision&revision=346143 : [InstCombine] loosen FP 0.0 constraint for fcmp+select substitution

http://llvm.org/viewvc/llvm-project?view=revision&revision=345734 : [InstCombine] refactor fabs+fcmp fold; NFC

http://llvm.org/viewvc/llvm-project?view=revision&revision=345728 : [InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negative

http://llvm.org/viewvc/llvm-project?view=revision&revision=345727 : [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC

While multiple people have been working on finer-grained control over fast-math optimizations and other relaxed numerics modes, there has also been some initial progress on adding support for more constrained numerics models. There has been considerable progress towards adding and enabling constrained floating-point intrinsics to capture FENV_ACCESS ON and similar semantic models.

These experimental constrained intrinsics prohibit certain transforms that are not safe if the default floating-point environment is not in effect. Historically, LLVM has in practice basically â€œsplit the differenceâ€ with regard to such transforms; they havenâ€™t been explicitly disallowed, as LLVM doesnâ€™t model the floating-point environment, but they have been disabled when they caused trouble for tests or software projects. The absence of a formal model for licensing these transforms constrains our ability to enable them. Bringing language and backend support for constrained intrinsics across the finish line will allow us to include transforms that we disable as a matter of practicality today, and allow us to give developers an easy escape valve (in the form of FENV_ACCESS ON and similar language controls) when they need more precise control, rather than an ad-hoc set of flags to pass to the driver.

We should discuss these new intrinsics to make sure that they can capture the right models for all the languages that LLVM supports.

Here are some possible discussion items:

Should specialization be applied at the call level for edges in a call graph where the caller has special context to extend into the callee wrt to flags?
Should the inliner apply something similar to calls that meet inlining criteria?
What other part(s) of the compiler could make use of IR flags that are currently not covered?
What work needs to be done regarding code debt wrt current areas of implementation.

FOSDEM 2019 LLVM developer room report

Thu, 07 Mar 2019 02:33:00 +0000

As well as at the LLVM developer meetings, the LLVM community is also present at a number of other events. One of those is FOSDEM, which has had a dedicated LLVM track since 2014.

Earlier this February, the LLVM dev room was back for the 6th time.

FOSDEM is one of the largest open source conferences, attracting over 8000 developers attending over 30 parallel tracks, occupying almost all space of the ULB university campus in Brussels.

In comparison to the LLVM developer meetings, this dev room offers more of an opportunity to meet up with developers from a very wide range of open source projects.

As in previous years, the LLVM dev room program consisted of presentations with a varied target audience, ranging from LLVM developers to LLVM users, including people not yet using LLVM but interested in discovering what can be done with it.

On the day itself, the room was completely packed for most presentations, often with people waiting outside to be able to enter for the next presentation.

Slides and videos of the presentations are available via the links below

Roll your own compiler with LLVM (Kai Nacke)
Rewriting Pointer Dereferences in bcc with Clang (Paul Chaignon)
Building an LLVM-based tool (Alex Denisov)
Debug info in optimized code - how far can we go? (Nikola Prica, Djordje Todorovic)
Lessons in TableGen (Nicolai HÃ¤hnle)
LLVM for the Apollo Guidance Computer (Lewis Revill)
llvm.mix Multi-stage compiler-assisted specializer generator built on LLVM ( Eugene Sharygin)
SMT-Based Refutation of Spurious Bug Reports in the Clang Static Analyzer (Mikhail Gadelha)
What makes LLD so fast? (Peter Smith)
Compiling the Linux kernel with LLVM tools (Nick Desaulniers and Bill Wendling)
It was working yesterday! Investigating regressions with llvmlab bisect (Leandro Nunes)

Finally, I want to express my gratitude to the LLVM Foundation, which sponsored travel expenses for a few presenters who couldn't otherwise have made it to the conference.

EuroLLVM'19 developers' meeting program

Mon, 11 Feb 2019 09:49:00 +0000

The LLVM Foundation is excited to announce the program for the EuroLLVM'19 developers' meeting (April 8 - 9 in Brussels / Belgium) !

Keynote

MLIR: Multi-Level Intermediate Representation for Compiler Infrastructure Tatiana Shpeisman (Google), Chris Lattner (Google)

Technical talks

Switching a Linux distribution's main toolchains to LLVM/Clang Bernhard RosenkrÃ¤nzer (Linaro, OpenMandriva, LinDev)
Just compile it: High-level programming on the GPU with Julia Tim Besard (Ghent University)
The Future of AST Matcher-based Refactoring Stephen Kelly
A compiler approach to Cyber-Security FranÃ§ois de FerriÃ¨re (STMicroelectronics)
Compiler Optimizations for (OpenMP) Target Offloading to GPUs Johannes Doerfert (Argonne National Laboratory), Hal Finkel (Argonne National Laboratory)
Handling massive concurrency: Development of a programming model for GPU and CPU Matthias Liedtke (SAP)
Automated GPU Kernel Fusion with XLA Thomas Joerg (Google)
The Helium Haskell compiler and its new LLVM backend Ivo Gabe de Wolff (University of Utrecht)
Testing and Qualification of Optimizing Compilers for Functional Safety JosÃ© Luis March Cabrelles (Solid Sands)
Improving Debug Information in LLVM to Recover Optimized-out Function Parameters Nikola Prica (RT-RK), Djordje Todorovic (RT-RK), Ananthakrishna Sowda (CISCO), Ivan Baev (CISCO)
LLVM IR in GraalVM: Multi-Level, Polyglot Debugging with Sulong Jacob Kreindl (Johannes Kepler University Linz)
LLDB Reproducers Jonas Devlieghere (Apple)
Sulong: An experience report of using the "other end" of LLVM in GraalVM. Roland Schatz (Oracle Labs), Josef Eisl (Oracle Labs)
SYCL compiler: zero-cost abstraction and type safety for heterogeneous computing Andrew Savonichev (Intel)
Handling all Facebook requests with JITed C++ code Huapeng Zhou (Facebook), Yuhan Guo (Facebook)
clang-scan-deps: Fast dependency scanning for explicit modules Alex Lorenz (Apple), Michael Spencer (Apple)
Clang tools for implementing cryptographic protocols like OTRv4 Sofia Celi (Centro de Autonomia Digital)
Implementing the C++ Core Guidelines'; Lifetime Safety Profile in Clang Gabor Horvath (Eotvos Lorand University), Matthias Gehre (Silexica GmbH), Herb Sutter (Microsoft)
Changes to the C++ standard library for C++20 Marshall Clow (CppAlliance)
Adventures with RISC-V Vectors and LLVM Robin Kruppe (TU Darmstadt), Roger Espasa (Esperanto Technologies)
A Tale of Two ABIs: ILP32 on AArch64 Tim Northover (Apple)
LLVM Numerics Improvements Michael Berg (Apple), Steve Canon (Apple)
DOE Proxy Apps: Compiler Performance Analysis and Optimistic Annotation Exploration Brian Homerding (Argonne National Laboratory), Johannes Doerfert (Argonne National Laboratory)
Loop Fusion, Loop Distribution and their Place in the Loop Optimization Pipeline Kit Barton (IBM), Johannes Doerfert (Argonne National Lab), Hal Finkel (Argonne National Lab), Michael Kruse (Argonne National Lab)

Tutorials

Tutorial: Building a Compiler with MLIR Amini Mehdi (Google), Jacques Pienaar (Google), Nicolas Vasilache (Google)
Building an LLVM-based tool: lessons learned Alex Denisov
LLVM IR Tutorial - Phis, GEPs and other things, oh my! Vince Bridgers (Intel Corporation), Felipe de Azevedo Piovezan (Intel Corporation)

Student Research Competition

Safely Optimizing Casts between Pointers and Integers Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu (University of Utah, USA), John Regehr (University of Utah, USA), Nuno P. Lopes (Microsoft Research, UK)
An alternative OpenMP Backend for Polly Michael HalkenhÃ¤user (TU Darmstadt)
Implementing SPMD control flow in LLVM using reconverging CFGs Fabian Wahlster (Technische UniversitÃ¤t MÃ¼nchen), Nicolai HÃ¤hnle (Advanced Micro Devices)
Function Merging by Sequence Alignment Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of Edinburgh), Hugh Leather (University of Edinburgh)
Compilation and optimization with security annotations Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM), Albert Cohen (Google)
Adding support for C++ contracts to Clang Javier LÃ³pez-GÃ³mez (University Carlos III of Madrid), J. Daniel GarcÃa (University Carlos III of Madrid)

Lightning talks

LLVM IR Timing Predictions: Fast Explorations via lli Alessandro Cornaglia (FZI - Research Center for Information Technology)
Simple Outer-Loop-Vectorization == LoopUnroll-And-Jam + SLP Dibyendu Das (AMD)
Clacc 2019: An Update on OpenACC Support for Clang and LLVM Joel E. Denny (Oak Ridge National Laboratory), Seyong Lee (Oak Ridge National Laboratory), Jeffrey S. Vetter (Oak Ridge National Laboratory)
Targeting a statically compiled program repository with LLVM Phil Camp (SN Systems (Sony Interactive Entertainment)), Russell Gallop (SN Systems (Sony Interactive Entertainment))
Does the win32 clang compiler executable really need to be over 21MB in size? Russell Gallop (SN Systems), Greg Bedwell (SN Systems)
Resolving the almost decade old checker dependency issue in the Clang Static Analyzer KristÃ³f Umann (Ericsson Hungary, EÃ¶tvÃ¶s LorÃ¡nd University)
Adopting LLVM Binary Utilities in Toolchains Jordan Rupprecht (Google)
Multiplication and Division in the Range-Based Constraint Manager ÃdÃ¡m Balogh (Ericsson Hungary Ltd.)
Statistics Based Checkers in the Clang Static Analyzer ÃdÃ¡m Balogh (Ericsson Hungary Ltd.)
Flang Update Steve Scalpone (NVIDA / PGI / Flang)
Swinging Modulo Scheduling together with Register Allocation Lama Saba (Intel)
LLVM for the Apollo Guidance Computer Lewis Revill (University of Bath)
Catch dangling inner pointers with the Clang Static Analyzer RÃ©ka KovÃ¡cs (EÃ¶tvÃ¶s LorÃ¤nd University)
Cross translation unit test case reduction RÃ©ka KovÃ¡cs (EÃ¶tvÃ¶s LorÃ¤nd University)

BoFs

RFC: Towards Vector Predication in LLVM IR Simon Moll (Saarland University), Sebastian Hack (Saarland University)
IPO --- Where are we, where do we want to go? Johannes Doerfert (Argonne National Laboratory), Kit Barton (IBM Toronto Lab)
LLVM binutils James Henderson (SN Systems (Sony Interactive Entertainment)), Jordan Rupprecht (Google)
RFC: Reference OpenCL Runtime library for LLVM Andrew Savonichev (Intel), Alexey Sachkov (Intel)
LLVM Interface Stability Guarantees BoF Stephen Kelly
Clang Static Analyzer BoF Devin Coughlin (Apple), Gabor Horvath (Eotvos Lorand University)
LLVM Numerics Improvements Michael Berg (Apple), Steve Canon (Apple)

Posters

Clava: C/C++ source-to-source from CMake using LARA JoÃ£o Bispo (FEUP/INESCTEC)
Safely Optimizing Casts between Pointers and Integers Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu (University of Utah, USA), John Regehr (University of Utah, USA), Nuno P. Lopes (Microsoft Research, UK)
Scalar Evolution Canon: Click! Canonicalize SCEV and validate it by Z3 SMT solver! Lin-Ya Yu (Xilinx), Alexandre Isoard (Xilinx)
Splendid GVN: Partial Redundancy Elimination for Algebraic Simplification Li-An Her (National Tsing Hua University), Jenq-Kuen Lee (National Tsing Hua University)
An alternative OpenMP Backend for Polly Michael HalkenhÃ¤user (TU Darmstadt)
Does the win32 clang compiler executable really need to be over 21MB in size? Russell Gallop (SN Systems), G Bedwell (SN Systems)
Enabling Multi- and Cross-Language Verification with LLVM Zvonimir Rakamaric (University of Utah)
Instruction Tracing and dynamic codegen analysis to identify unique llvm performance issues. Biplob (IBM)
Handling all Facebook requests with JITed C++ code Huapeng Zhou (Facebook), Yuhan Guo (Facebook)
Implementing SPMD control flow in LLVM using reconverging CFGs Fabian Wahlster (Technische UniversitÃ¤t MÃ¼nchen), Nicolai HÃ¤hnle (Advanced Micro Devices)
LLVM for the Apollo Guidance Computer Lewis Revill (University of Bath)
LLVM Miner: Text Analytics based Static Knowledge Extractor Hameeza Ahmed (NED University of Engineering and Technology), Muhammad Ali Ismail (NED University of Engineering and Technology)
Function Merging by Sequence Alignment Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of Edinburgh), Hugh Leather (University of Edinburgh)
Compilation and optimization with security annotations Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM), Albert Cohen (Google)
Cross translation unit test case reduction RÃ©ka KovÃ¡cs (EÃ¶tvÃ¶s LorÃ¤nd University)
Leveraging Polyhedral Compilation in Chapel Compiler Sahil Yerawar (IIT Hyderabad), Siddharth Bhat (IIIT Hyderabad), Michael Ferguson (Cray Inc.), Philip Pfaffe (Karlsruhe Institute of Technology), Ramakrishna Upadrasta (IIT Hyderabad)
LLVM on AVR - textual IR as a powerful tool for making "impossible" compilers Carl Peto (Swift for Arduino/Petosoft)
Vectorizing Add/Sub Expressions with SLP Vasileios Porpodas (Intel Corporation, USA), Rodrigo C. O. Rocha (University of Edinburgh, UK), Evgueni Brevnov (Intel Corporation, USA), LuÃs F. W. GÃ³es (PUC Minas, Brazil), Timothy Mattson (Intel Corporation, USA)
Adding support for C++ contracts to Clang Javier LÃ³pez-GÃ³mez (University Carlos III of Madrid), J. Daniel GarcÃa (University Carlos III of Madrid)
Optimizing Nondeterminacy: Exploiting Race Conditions in Parallel Programs William S. Moses (MIT CSAIL)

If you are interested in any of this talks, you should register to attend the EuroLLVM'19. Tickets are limited !

More information about the EuroLLVM'19 is available here.

30% faster Windows builds with clang-cl and the new /Zc:dllexportInlines- flag

Wed, 14 Nov 2018 04:49:00 +0000

Background

In the course of adding Microsoft Visual C++ (MSVC) compatible Windows support to Clang, we worked hard to make sure the dllexport and dllimport declspecs are handled the same way by Clang as by MSVC.

dllexport and dllimport are used to specify what functions and variables should be externally accessible ("exported") from the currently compiled Dynamic-Link Library (DLL), or should be accessed ("imported") from another DLL. In the class declaration below, S::foo() will be exported when building a DLL:


struct __declspec(dllexport) S {
 void foo() {}
};

and code using that DLL would typically see a declaration like this:


struct __declspec(dllimport) S {
 void foo() {}
};

to indicate that the function is defined in and should be accessed from another DLL.

Often the same declaration is used along with a preprocessor macro to flip between dllexport and dllimport, depending on whether a DLL is being built or consumed.

The basic idea of dllexport and dllimport is simple, but the semantics get more complicated as they interact with more facets of the C++ language: templates, inheritance, different kinds of instantiation, redeclarations with different declspecs, and so on. Sometimes the semantics are surprising, but by now we think clang-cl gets most of them right. And as the old maxim goes, once you know the rules well, you can start tactfully breaking them.

One issue with dllexport is that for inline functions such as S::foo() above, the compiler must emit the definition even if it's not used in the translation unit. That's because the DLL must export it, and the compiler cannot know if any other translation unit will provide a definition.

This is very inefficient. A dllexported class with inline members in a header file will cause definitions of those members to be emitted in every translation unit that includes the header, directly or indirectly. And as we know, C++ source files often end up including a lot of headers. This behaviour is also different from non-Windows systems, where inline function definitions are not emitted unless they're used, even in shared objects and dynamic libraries.

/Zc:dllexportInlines-

To address this problem, clang-cl recently gained a new command-line flag, /Zc:dllexportInlines- (MSVC uses the /Zc: prefix for language conformance options). The basic idea is simple: since the definition of an inline function is available along with its declaration, it's not necessary to import or export it from a DLL — the inline definition can be used directly. The effect of the flag is to not apply class-level dllexport/dllimport declspecs to inline member functions. In the two examples above, it means S::foo() would not be dllexported or dllimported, even though the S class is declared as such.

This is very similar to the -fvisibility-inlines-hidden Clang and GCC flag used on non-Windows. For C++ projects with many inline functions, it can significantly reduce the set of exported functions, and thereby the symbol table and file size of the shared object or dynamic library, as well as program load time.

On Windows however, the main benefit is not having to emit the unused inline function definitions. This means the compiler has to do much less work, and reduces object file size which in turn reduces the work for the linker. For Chrome, we saw 30 % faster full builds, 30 % shorter link times for blink_core.dll, and 40 % smaller total .obj file size.

The reduction in .obj file size, combined with the enormous reduction in .lib files allowed by previously switching linkers to lld-link which uses thin archives, means that a typical Chrome build directory is now 60 % smaller than it would have been just a year ago.

(Some of the same benefit can be had without this flag if the dllexport inline function comes from a pre-compiled header (PCH) file. In that case, the definition will be emitted in the object file when building the PCH, and so is not emitted elsewhere unless it's used.)

Compatibility

Using /Zc:dllexportInlines- is "half ABI incompatible". If it's used to build a DLL, inline members will no longer be exported, so any code using the DLL must use the same flag to not dllimport those members. However, the reverse scenario generally works: a DLL compiled without the flag (such as a system DLL built with MSVC) can be referenced from code that uses the flag, meaning that the referencing code will use the inline definitions instead of importing them from the DLL.

Like -fvisibility-inlines-hidden, /Zc:dllexportInlines- breaks the C++ language guarantee that (even an inline) function has a unique address within the program. When using these flags, an inline function will have a different address when used inside the library and outside.

Also, these flags can lead to link errors when inline functions, which would normally be dllimported, refer to internal symbols of a DLL:


void internal();

struct __declspec(dllimport) S {
 void foo() { internal(); }
}

Normally, references to S::foo() would use the definition in the DLL, which also contains the definition of internal(), but when using /Zc:dllexportInlines-, the inline definition of S::foo() is used directly, resulting in a link error since no definition of internal() can be found.

Even worse, if there is an inline definition of internal() containing a static local variable, the program will now refer to a different instance of that variable than in the DLL:


inline int internal() { static int x; return x++; }

struct __declspec(dllimport) S {
 int foo() { return internal(); }
}

This could lead to very subtle bugs. However, since Chrome already uses -fvisibility-inlines-hidden, which has the same potential problem, we believe this is not a common issue.

Summary

/Zc:dllexportInlines- is like -fvisibility-inlines-hidden for DLLs and significantly reduces build times. We're excited that using Clang on Windows allows us to benefit from new features like this.

More information

For more information, see the User's Manual for /Zc:dllexportInlines-.

The flag was added in Clang r346069, which will be part of the Clang 8 release expected in March 2019. It's also available in the Windows Snapshot Build.

Acknowledgements

/Zc:dllexportInlines- was implemented by Takuto Ikuta based on a prototype by Nico Weber.

Integration of libc++ and OpenMP packages into llvm-toolchain

Tue, 25 Sep 2018 08:29:00 +0000

A bit more than a year ago, we gave an update about recent changes in apt.llvm.org. Since then, we noticed an important increase of the usage of the service. Just last month, we saw more than 16.5TB of data being transferred from our CDN.

Thanks to the Google Summer of Code 2018, and after number of requests, we decided to focus our energy to bring new great projects from the LLVM ecosystems into apt.llvm.org.

Starting from version 7, libc++, libc++abi and OpenMP packages are available into the llvm-toolchain packages. This means that, just like clang, lldb or lldb, libc++, libc++abi and OpenMP packages are also built, tested and shipped on https://apt.llvm.org/.

The integration focuses to preserve the current usage of these libraries. The newly merged packages have adopted the llvm-toolchain versioning:

libc++ packages

libc++1-7
libc++-7-dev

libc++abi packages

libc++abi1-7
libc++abi-7-dev

OpenMP packages

libomp5-7
libomp-7-dev
libomp-7-doc

This packages are built twice a day for trunk. For version 7, only when new changes happen in the SVN branches.

Integration of libc++* packages

Both libc++ and libc++abi packages are built at same time using the clang built during the process. The existing libc++ and libc++abi packages present in Debian and Ubuntu repositories will not be affected (they will be removed at some point). Newly integrated libcxx* packages are not co-installable with them.

Symlinks have been provided from the original locations to keep the library usage same.

Example: /usr/lib/x86_64-linux-gnu/libc++.so.1.0 -> /usr/lib/llvm-7/lib/libc++.so.1.0

The usage of the libc++ remains super easy:

Usage:

$ clang++-7 -std=c++11 -stdlib=libc++ foo.cpp

$ ldd ./a.out|grep libc++

libc++.so.1 => /usr/lib/x86_64-linux-gnu/libc++.so.1 (0x00007f62a1a90000)

libc++abi.so.1 => /usr/lib/x86_64-linux-gnu/libc++abi.so.1 (0x00007f62a1a59000)

In order to test new developments in libc++, we are also building the experimental features.

For example, the following command will work out of the box:

$ clang++-7 -std=c++17 -stdlib=libc++ foo.cpp -lc++experimental -lc++fs

Integration of OpenMP packages

While OpenMP packages have been present in the Debian and Ubuntu archives for a while, only a single version of the package was available.

For now, the newly integrated packages creates a symlink from /usr/lib/libomp.so.5 to /usr/lib/llvm-7/lib/libomp.so.5 keeping the current usage same and making them non co-installable.

It can be used with clang through -fopenmp flag:

$ clang -fopenmp foo.c

The dependency packages providing the default libc++* and OpenMP package are also integrated in llvm-defaults. This means that the following command will install all these new packages at the current version:

$ apt-get install libc++-dev libc++abi-dev libomp-dev

LLVM 7 => 8 transition

In parallel of the libc++ and OpenMP work, https://apt.llvm.org/ has been updated to reflect the branching of 7 from the trunk branches.

Therefore, we have currently on the platform:

Stable	6.0
Qualification	7
Development	8

Please note that, from version 7, the packages and libraries are called 7 (and not 7.0).

For the rational and implementation, see https://reviews.llvm.org/D41869 & https://reviews.llvm.org/D41808.

Stable packages of LLVM toolchain are already officially available in Debian Buster and in Ubuntu Cosmic.

Cosmic support

In order to make sure that the LLVM toolchain does not have too many regressions with this new version, we also support the next Ubuntu version, 18.10, aka Cosmic.

A Note on coinstallability

We tried to make them coinstallable, in the resulting packages we had no control over the libraries used during the runtime. This could lead to many unforeseen issues. Keeping these in mind we settled to keep them conflicting with other versions.

Future work

Code coverage build fails for newly integrated packages
Move to a 2 phases build to generate clang binary using clang

Sources of the project are available on the gitlab instance of Debian: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/tree/7

Reshabh Sharma & Sylvestre Ledru

Announcing the new LLVM Foundation Board of Directors

Tue, 18 Sep 2018 09:00:00 +0000

The LLVM Foundation is pleased to announce its new Board of Directors:

Chandler Carruth
Mike Edwards (Treasurer)
Hal Finkel
Arnaud de Grandmaison
Anton Korobeynikov
Tanya Lattner (President)
Chris Lattner
John Regehr (Secretary)
Tom Stellard

Two new members and seven continuing members were elected to the nine person board.

We want to thank David Kipping for his 2 terms on the board. David has been actively involved with the LLVM Developer Meetings and was the treasurer for the past 4 years. The treasurer is a time demanding position in that he supports the day to day operation of the foundation, balancing the books, and generates monthly treasurer reports.

We also want to thank all the applicants to the board. When voting on new board members, we took into consideration all contributions (past and present) and current involvement in the LLVM community. We also tried to create a balanced board of individuals from a wide range of backgrounds and locations to provide a voice to as many groups within the LLVM community. Given this criteria and strong applicants, we increased the board from 8 members to 9.

About the board of directors (listed alphabetically by last name):

Chandler Carruth:

Chandler Carruth has been an active contributor to LLVM since 2007. Over the years, he has has worked on LLVMâ€™s memory model and atomics, Clangâ€™s C++ support, GCC-compatible driver, initial profile-aware code layout optimization pass, pass manager, IPO infrastructure, and much more. He is the current code owner of inlining and SSA formation.

In addition to his numerous technical contributions, Chandler has led Googleâ€™s LLVM efforts since 2010 and shepherded a number of new efforts that have positively and significantly impacted the LLVM project. These new efforts include things such as adding C++ modules to Clang, adding address and other sanitizers to Clang/LLVM, making Clang compatible with MSVC and available to the Windows C++ developer community, and much more.

Chandler works at Google Inc. as a technical lead for their C++ developer platform and has served on the LLVM Foundation board of directors for the last 4 years.

Mike Edwards:

Mike Edwards is a relative newcomer to the LLVM community, beginning his involvement just a few years ago while working for Sony Playstation. Finding the LLVM community to be an incredibly amazing and welcoming group of people, Mike knew he had to find a way to contribute. Mikeâ€™s previous work in DevOps led him to get involved in helping to work on the llvm.org infrastructure. Last year, with the help of the Board and several community members, Mike was able to get the llvm.org infrastructure moved onto a modern compute platform at Amazon Web Services. Mike is one of the maintainers of our llvm.org infrastructure.

Mike is currently working as a Software Engineer at Apple, Inc. working on the Continuous Integration and Quality Engineering efforts for LLVM and Clang development.

Hal Finkel:

Hal Finkel has been an active contributor to the LLVM project since 2011. He is the code owner for the PowerPC target, the alias-analysis infrastructure, and other components.

In addition to his numerous technical contributions, Hal has chaired the LLVM in HPC workshop, which is held in conjunction with Super Computing (SC), for the last five years. This workshop provides a venue for the presentation of peer-reviewed HPC-related researching LLVM from both industry and academia. He has also been involved in organizing an LLVM-themed BoF session at SC and LLVM socials in Austin.

Hal is Lead for Compiler Technology and Programming Languages at Argonne National Laboratoryâ€™s Leadership Computing Facility.

Arnaud de Grandmaison:

Arnaud de Grandmaison has been hacking on LLVM projects since 2008. In addition to his open source contributions, he has worked for many years on private out-of-tree LLVM-based projects at Parrot, DiBcom, or Arm. He has also been a leader in the European LLVM community by organizing the EuroLLVM Developersâ€™ meeting, Paris socials, and chaired or participated in numerous program committees for the LLVM Developersâ€™ Meetings and other LLVM related conferences.

Arnaud has attended numerous LLVM Developersâ€™ meetings and volunteered as moderator or presented as well. He also moderates several LLVM mailing lists. Arnaud is also very involved in community wide discussions and decisions such as re-licensing and code of conduct.

Arnaud is a Senior Principal Engineer at Arm.

Anton Korobeynikov:

In addition to his technical contributions, Anton has maintained LLVMâ€™s participation in Google Summer of Code by managing applications, deadlines, and overall organization. He also supports the LLVM infrastructure and has been on numerous program committees for the LLVM Developersâ€™ Meetings (both US and EuroLLVM).

Anton is currently an associate professor at the Saint Petersburg State University and has served on the LLVM Foundation board of directors for the last 4 years.

Tanya Lattner:

Tanya Lattner has been involved in the LLVM project for over 14 years. She began as a graduate student who wrote her master's thesis using LLVM, and continued on using and extending LLVM technologies at various jobs during her career as a compiler engineer.

Tanya has been organizing the US LLVM Developersâ€™ meeting since 2008 and attended every developer meeting. She was the LLVM release manager for 3 years, moderates the LLVM mailing lists, and helps administer the LLVM infrastructure servers, mailing lists, bugzilla, etc. Tanya has also been on the program committee for the US LLVM Developersâ€™ meeting (4+ years) and the EuroLLVM Developersâ€™ Meeting.

With the support of the initial board of directors, Tanya created the LLVM Foundation, defined its charitable and education mission, and worked to get 501(c)(3) status.

Tanya is the Chief Operating Officer and has served as the President of the LLVM Foundation board for the last 4 years.

Chris Lattner:

Chris Lattner is well known as the founder for the LLVM project and has a lengthy history of technical contributions to the project over the years. He drove much of the early implementation, architecture, and design of LLVM and Clang.

Chris has attended every LLVM Developersâ€™ meeting, and presented at many of them. He helped drive the conception and incorporation of the LLVM Foundation, and has served as its secretary. Chris also grants commit access to the LLVM Project, moderates mailing lists, moderates and edits the LLVM blog, and drives important non-technical discussions and policy decisions related to the LLVM project.

Chris manages a team building machine learning infrastructure at Google and has served on the LLVM Foundation board of directors for the last 4 years.

John Regehr:

John Regehr has been involved in LLVM for a number of years. As a professor of computer science at the University of Utah, his research specializes in compiler correctness and undefined behavior. He is well known within the LLVM community for the hundreds of bug reports his group has reported to LLVM/Clang.

John was a project lead for IOC, a Clang based integer overflow checker that eventually became the basis for the integer parts of UBSan. He was also the primary developer of C-Reduce which utilizes Clang as a library and is often used as a test case reducer for compiler issues.

In addition to his technical contributions, John has served on several LLVM-related program committees. He also has a widely read blog about LLVM and other compiler-related issues (Embedded in Academia).

Tom Stellard:

Tom is currently a Software Engineer at Red Hat and is the technical lead for emerging toolchains including Clang/LLvm. He also maintains the LLVM packages for the Fedora project.

Announcing the program for the 2018 LLVM Developers' Meeting Bay Area

Fri, 31 Aug 2018 16:29:00 +0000

The LLVM Foundation is excited to announce the program for the 2018 LLVM Developers' Meeting in San Jose, CA on October 17 & 18.
As a reminder, ticket prices for the event will increase on September 17th. Purchase your tickets today!
Technical Talks

Lessons Learned Implementing Common Lisp with LLVM over Six Years - Christian Schafmeister
Porting Function merging pass to thinlto - Aditya Kumar
Build Impact of Explicit and C++ Standard Modules - David Blaikie
Profile Guided Code Layout in LLVM and LLD - Michael Spencer
Developer Toolchain for the Nintendo Switch - Bob Campbell, Jeff Sirois
Methods for Maintaining OpenMP Semantics without Being Overly Conservative - Jin Lin, Ernesto Su, Xinmin Tian
Understanding the performance of code using LLVM's Machine Code Analyzer (llvm-mca) - Andrea Di Biagio, Matt Davis
Art Class for Dragons: Supporting GPU compilation without metadata hacks! - Neil Hickey
Implementing an OpenCL compiler for CPU in LLVM - Evgeniy Tyurin
Working with Standalone Builds of LLVM sub-projects - Tom Stellard
Loop Transformations in LLVM: The Good, the Bad, and the Ugly - Michael Kruse, Hal Finkel
Efficiently Implementing Runtime Metadata with LLVM - Joe Groff, Doug Gregor
Coroutine Representations and ABIs in LLVM - John McCall
Glow: LLVM-based machine learning compiler - Nadav Rotem, Jakob Olesen
Graph Program Extraction and Device Partitioning in Swift for TensorFlow - Mingsheng Hong, Chris Lattner
Memory Tagging, how it improves C++ memory safety, and what does it mean for compiler optimizations - Kostya Serebryany, Evgenii Stepanov, Vlad Tsyrklevich
Improving code reuse in clang tools with clangmetatool - Daniel Ruoso
Sound Devirtualization in LLVM - Piotr Padlewski, Krzysztof Pszeniczny
Extending the SLP vectorizer to support variable vector widths - Vasileios Porpodas, Rodrigo C. O. Rocha, LuÃs F. W. GÃ³es
Revisiting Loop Fusion, and its place in the loop transformation framework. - Johannes Doerfert, Kit Barton, Hal Finkel, Michael Kruse
Optimizing Indirections, using abstractions without remorse. - Johannes Doerfert, Hal Finkel
Outer Loop Vectorization in LLVM: Current Status and Future Plans - Florian Hahn, Satish Guggilla, Diego Caballero
Stories from RV: The LLVM vectorization ecosystem - Simon Moll, Matthias Kurtenacker, Sebastian Hack
Faster, Stronger C++ Analysis with the Clang Static Analyzer - George Karpenkov, Artem Dergachev

Tutorials

Updating ORC JIT for Concurrency - Lang Hames, Breckin Loggins
Register Allocation: More than Coloring - Matthias Braun
How to use LLVM to optimize your parallel programs - William S. Moses
LLVM backend development by example (RISC-V) - Alex Bradbury

Birds of a Feather

Debug Info BoF - Vedant Kumar, Adrian Prantl
Lifecycle of LLVM bug reports - Kristof Beyls, Paul Robinson
GlobalISel Design and Development - Amara Emerson
Migrating to C++14, and beyond! - JF Bastien
Ideal versus Reality: Optimal Parallelism and Offloading Support in LLVM - Xinmin Tian, Hal Finkel, TB Schardl, Johannes Doerfert, Vikram Adve
Implementing the parallel STL in libc++ - Louis Dionne
Clang Static Analyzer BoF - Devin Coughlin
LLVM Foundation BoF - LLVM Foundation Board of Directors

Lightning Talks

Automatic Differentiation in C/C++ Using Clang Plugin Infrastructure - Vassil Vassilev, Aleksandr Efremov
More efficient LLVM devs: 1000x faster build file generation, -j1000 builds, and O(1) test execution - Nico Weber
Heap-to-Stack Conversion - Hal Finkel
TWINS - This Workflow is Not Scrum: Adapting Agile for Open Source Interaction - Joshua Magee
Mutating the clang AST from Plugins - Andrei Homescu, Per Larsen
atJIT: an online, feedback-directed optimizer for C++ - Kavon Farvardin, Hal Finkel, Michael Kruse, John Reppy
Repurposing GCC Regression for LLVM Based Tool Chains - Jeremy Bennett, Simon Cook, Ed Jones
ThinLTO Summaries in JIT Compilation - Stefan GrÃ¤nitz
Refuting False Bugs in the Clang Static Analyzer using SMT Solvers - Mikhail R. Gadelha
Whatâ€™s New In Outlining - Jessica Paquette
DWARF v5 Highlights - Why You Care - Paul Robinson, Pavel Labath, Wolfgang Pieb
Using TAPI to Understand APIs and Speed Up Builds - Steven Wu, Juergen Ributzka
Hardware Interference Size - JF Bastien
Dex: efficient symbol index for Clangd - Kirill Bobyrev, Eric Liu, Sam McCall, Ilya Biryukov
Flang Update - Steve Scalpone
clang-doc: an elegant generator for more civilized documentation - Julie Hockett
Code Coverage with CPU Performance Monitoring Unit - Ivan Baev, Bharathi Seshadri, Stefan Pejic
VecClone Pass: Function Vectorization via LoopVectorizer - Matt Masten, Evgeniy Tyurin, Konstantina Mitropoulou
ISL Memory Management Using Clang Static Analyzer - Malhar Thakkar, Ramakrishna Upadrasta
Eliminating always_inline in libc++: a journey of visibility and linkage - Louis Dionne
Error Handling in Libraries: A Case Study - James Henderson

Posters

Gaining fine-grain control over pass management - serge guelton, adrien guinet, pierrick brunet, juan manuel martinez, bÃ©atrice creusillet
Integration of OpenMP, libcxx and libcxxabi packages into LLVM toolchain - Reshabh Sharma
Improving Debug Information in LLVM to Recover Optimized-out Function Parameters - Ananthakrishna Sowda, Djordje Todorovic, Nikola Prica, Ivan Baev
Automatic Compression for LLVM RISC-V - Sameer AbuAsal, Ana Pazos
Guaranteeing the Correctness of LLVM RISC-V Machine Code with Fuzzing - Jocelyn Wei, Ana Pazos, Mandeep Singh Grang
NEC SX-Aurora - A Scalable Vector Architecture - Kazuhisa Ishizaka, Kazushi Marukawa, Erich Focht, Simon Moll, Matthias Kurtenacker, Sebastian Hack
Extending Clang Static Analyzer to enable Cross Translation Unit Analysis - Varun Subramanian
Leveraging Polyhedral Compilation in Chapel Compiler - Siddharth Bhat, Michael Ferguson, Philip Pfaffe, Sahil Yerawar

2018 LLVM Foundation's Women in Compilers and Tools Workshop

Thu, 23 Aug 2018 22:41:00 +0000

The LLVM Foundation is excited to announce our first half day Women in Compilers and Tools Workshop held the day before the 2018 LLVM Developersâ€™ Meeting - Bay Area. The workshop will be held at the Fairmont Hotel on October 16th from 1:00-6:30PM and includes a cocktail reception.

This event aims to connect women in the field of compilers and tools and provide them with ideas and techniques to overcome barriers or enhance their careers. It also is open to anyone (not just women) who are interested in increasing diversity within the LLVM community, their workplace or university.

Registration for the event will open on Monday, August 27th at 9:00AM PDT. Attendance is limited to 100 attendees and tickets will be priced at $50 (students $25). Please see the EventBrite registration page for details.

The workshop will consist of 3 topics described below:

Inner Critic: How to Deal with Your Imposter Syndrome

Presented by Women Catalysts

You're smart. People really like you. And yet, you can't shake the feeling that maybe you don't really deserve your success. Or that someone else can do what you do better...and what if your boss can see it too? You are not alone: it's called the Imposter Syndrome. Believe it or not, the most confident and successful people often fear that

they are actually inadequate. The great Maya Angelou once said, "I have written 11 books, but each time I think, 'Uh-oh, they're going to find out now. I've run a game on everybody, and they're going to find me out.â€™" But it doesn't have to be that way. In this workshop, you'll learn to identify the voice of your Imposter Syndrome and develop with strategies for dealing with your inner critics.

Present! A Techie's Guide to Public Speaking

Presented by Karen Catlin

To grow your career, you know what you need to do: improve your public speaking skills.

Public speaking provides the visibility and professional credibility that helps you score the next big opportunity. But even more important is the fact that it transforms the way you communicate. Improved confidence and the ability to convey messages clearly will impact your relationships with your managers, coworkers, customers, industry peers, and even potential new hires.

In this presentation, Karen Catlin will cover the importance of speaking at conferences and events, along with strategies to get started. She'll share some favorite tips from the book she co-authored with Poornima Vijayashanker, "Present! A Techie's Guide to Public Speaking." And she'll tell some embarrassing stories that are just too good to keep to herself.

About Karen: After spending 25 years building software products, Karen Catlin is now an advocate for women in the tech industry. Sheâ€™s a leadership coach, a keynote and TEDx speaker, and co-author of "Present! A Techieâ€™s Guide to Public Speaking.â€

Formerly, Karen was a vice president of engineering at Macromedia and Adobe.

Karen holds a computer science degree from Brown University and serves as an advisor to Brown's Computer Science Diversity Initiative. Sheâ€™s also on the Advisory Boards for The Womenâ€™s CLUB of Silicon Valley and WEST (Women Entering & Staying in Technology).

Update on Women in Compilers & Tools Program

Presented by Tanya Lattner

Over the past year we have hosted panels and BoFs on women in compilers and tools. We now need to take many of the items discussed during the events and put them into action. We will discuss some key areas and potentially break into smaller groups to determine action plans and steps to move forward.

FAQ:

Do I need to attend the LLVM Developersâ€™ Meeting to attend this event?

This is an independent event which is open to anyone.

Is this a women only event?

Anyone is welcome to attend that values diversity within the field of compiler and tools. These topics can relate to anyone, not just women, and our mission is to improve inclusion and diversity in general.

Is there a financial hardship discount?

We have discounted the tickets for all attendees but please reach out to the organizer and we will decide on a case by case basis.

DragonFFI: FFI/JIT for the C language using Clang/LLVM

Tue, 13 Mar 2018 07:45:00 +0000

Introduction

A foreign function interface is "a mechanism by which a program written in one programming language can call routines or make use of services written in another".
In the case of DragonFFI, we expose a library that allows calling C functions and using C structures from any languages. Basically, we want to be able to do this, let's say in Python:

import pydffi
CU = pydffi.FFI().cdef("int puts(const char* s);");
CU.funcs.puts("hello world!")

or, in a more advanced way, for instance to use libarchive directly from Python:

import pydffi
pydffi.dlopen("/path/to/libarchive.so")
CU = pydffi.FFI().cdef("#include <archive.h>")
a = funcs.archive_read_new()
assert a
...

This blog post presents related works, their drawbacks, then how Clang/LLVM is used to circumvent these drawbacks, the inner working of DragonFFI and further ideas.
The code of the project is available on GitHub: https://github.com/aguinet/dragonffi. Python 2/3 wheels are available for Linux/OSX x86/x64. Python 3.6 wheels are available for Windows x64. On all these architectures, just use:

$ pip install pydffi

and play with it :)

See below for more information.

Related work

libffi is the reference library that provides a FFI for the C language. cffi is a Python binding around this library that also uses PyCParserto be able to easily declare interfaces and types. Both these libraries have limitations, among them:

libffi does not support the Microsoft x64 ABI under Linux x64. It isn't that trivial to add a new ABI (hand-written ABI, get the ABI right, ...), while a lot of effort have already been put into compilers to get these ABIs right.
PyCParser only supports a very limited subset of C (no includes, function attributes, ...).

Moreover, in 2014, Jordan Rose and John McCall from Apple made a talk at the LLVM developer meeting of San JosÃ© about how Clang can be used for C interoperability. This talk also shows various ABI issues, and has been a source of inspiration for DragonFFI at the beginning.

Somehow related, Sean Callanan, who worked on lldb, gave a talk in 2017 at the LLVM developer meeting of San JosÃ© on how we could use parts of Clang/LLVM to implement some kind of eval() for C++. What can be learned from this talk is that debuggers like lldb must also be able to call an arbitrary C function, and uses debug information among other things to solve it (what we also do, see below :)).

DragonFFI is based on Clang/LLVM, and thanks to that it is able to get around these issues:

it uses Clang to parse header files, allowing direct usage of a C library headers without adaptation;
it support as many calling conventions and function attributes as Clang/LLVM do;
as a bonus, Clang and LLVM allows on-the-fly compilation of C functions, without relying on the presence of a compiler on the system (you still need the headers of the system's libc thought, or MSVCRT headers under Windows);
and this is a good way to have fun with Clang and LLVM! :)

Let's dive in!

Creating an FFI library for C

Supporting C ABIs

A C function is always compiled for a given C ABI. The C ABI isn't defined per the official C standards, and is system/architecture-dependent. Lots of things are defined by these ABIs, and it can be quite error prone to implement.

To see how ABIs can become complex, let's compile this C code:

typedef struct {
 short a;
 int b;
} A;

void print_A(A s) {
 printf("%d %d\n", s.a, s.b);
}

Compiled for Linux x64, it gives this LLVM IR:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [7 x i8] c"%d %d\0A\00", align 1

define void @print_A(i64) local_unnamed_addr {
 %2 = trunc i64 %0 to i32
 %3 = lshr i64 %0, 32
 %4 = trunc i64 %3 to i32
 %5 = shl i32 %2, 16
 %6 = ashr exact i32 %5, 16
 %7 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i64 0, i64 0), i32 %6, i32 %4)
 ret void
}

What happens here is what is called structure coercion. To optimize some function calls, some ABIs pass structure values through registers. For instance, an llvm::ArrayRef object, which is basically a structure with a pointer and a size (see https://github.com/llvm-mirror/llvm/blob/release_60/include/llvm/ADT/ArrayRef.h#L51), is passed through registers (though this optimization isn't guaranteed by any standard).

It is important to understand that ABIs are complex things to implement and we don't want to redo this whole work by ourselves, particularly when LLVM/Clang already know how.

Finding the right type abstraction

We want to list every types that is used in a parsed C file. To achieve that goal, various information are needed, among which:

the function types, and their calling convention
for structures: field offsets and names
for union/enums: field names (and values)

On one hand, we have seen in the previous section that the LLVM IR is too Low Level (as in Low Level Virtual Machine) for this. On the other hand, Clang's AST is too high level. Indeed, let's print the Clang AST of the code above:

[...]
|-RecordDecl 0x5561d7f9fc20 <a.c:1:9, line:4:1> line:1:9 struct definition
| |-FieldDecl 0x5561d7ff4750 <line:2:3, col:9> col:9 referenced a 'short'
| `-FieldDecl 0x5561d7ff47b0 <line:3:3, col:7> col:7 referenced b 'int'

We can see that there is no information about the structure layout (padding, ...). There's also no information about the size of standard C types. As all of this depends on the backend used, it is not surprising that these informations are not present in the AST.

The right abstraction appears to be the LLVM metadata produced by Clang to emit DWARF or PDB structures. They provide structure fields offset/name, various basic type descriptions, and function calling conventions. Exactly what we need! For the example above, this gives (at the LLVM IR level, with some inline comments):

target triple = "x86_64-pc-linux-gnu"
%struct.A = type { i16, i32 }
@.str = private unnamed_addr constant [7 x i8] c"%d %d\0A\00", align 1

define void @print_A(i64) local_unnamed_addr !dbg !7 {
 %2 = trunc i64 %0 to i32
 %3 = lshr i64 %0, 32
 %4 = trunc i64 %3 to i32
 tail call void @llvm.dbg.value(metadata i32 %4, i64 0, metadata !18, metadata !19), !dbg !20
 tail call void @llvm.dbg.declare(metadata %struct.A* undef, metadata !18, metadata !21), !dbg !20
 %5 = shl i32 %2, 16, !dbg !22
 %6 = ashr exact i32 %5, 16, !dbg !22
 %7 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([...] @.str, i64 0, i64 0), i32 %6, i32 %4), !dbg !23
 ret void, !dbg !24
}

[...]
; DISubprogram defines (in our case) a C function, with its full type
!7 = distinct !DISubprogram(name: "print_A", scope: !1, file: !1, line: 6, type: !8, [...], variables: !17)
; This defines the type of our subprogram
!8 = !DISubroutineType(types: !9)
; We have the "original" types used for print_A, with the first one being the
; return type (null => void), and the other ones the arguments (in !10)
!9 = !{null, !10}
!10 = !DIDerivedType(tag: DW_TAG_typedef, name: "A", file: !1, line: 4, baseType: !11)
; This defines our structure, with its various fields
!11 = distinct !DICompositeType(tag: DW_TAG_structure_type, file: !1, line: 1, size: 64, elements: !12)
!12 = !{!13, !15}
; We have here the size and name of the member "a". Offset is 0 (default value)
!13 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !11, file: !1, line: 2, baseType: !14, size: 16)
!14 = !DIBasicType(name: "short", size: 16, encoding: DW_ATE_signed)
; We have here the size, offset and name of the member "b"
!15 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !11, file: !1, line: 3, baseType: !16, size: 32, offset: 32)
!16 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
[...]

Internals

DragonFFI first parses the debug information included by Clang in the LLVM IR it produces, and creates a custom type system to represent the various function types, structures, enumerations and typedefs of the parsed C file. This custom type system has been created for two reasons:

create a type system that gathers only the necessary informations from the metadata tree (we don't need the whole debug informations)
make the public headers of the DragonFFI library free from any LLVM headers (so that the whole LLVM headers aren't needed to use the library)

Once we've got this type system, the DragonFFI API for calling C functions is this one:

DFFI FFI([...]);
// This will declare puts as a function that returns int and takes a const

// char* as an argument. We could also create this function type by hand.
CompilationUnit CU = FFI.cdef("int puts(const char* s);", [...]);
NativeFunc F = CU.getFunction("puts");
const char* s = "hello world!";
void* Args[] = {&s};
int Ret;
F.call(&Ret, Args);

So, basically, a pointer to the returned data and an array of void* is given to DragonFFI. Each void* value is a pointer to the data that must be passed to the underlying function. So the last missing piece of the puzzle is the code that takes this array of void* (and pointer to the returned data) and calls puts, so a function like this:

void call_puts(void* Ret, void** Args) {
 *((int*)Ret) = puts((const char*) Args[0]);
}

We call these "function wrappers" (how original! :)). One advantage of this signature is that it is a generic signature, which can be used in the implementation of DragonFFI. Supposing we manage to compile at run-time this function, we can then call it trivially as in the following:

typedef void(*puts_call_ty)(void*, void**);
puts_call_ty Wrapper = /* pointer to the compiled wrapper function */;
Wrapper(Ret, Args);

Generating and compiling a function like this is something Clang/LLVM is able to do. For the record, this is also what libffi mainly does, by generating the necessary assembly by hand. We optimize the number of these wrappers in DragonFFI, by generating them for each different function type. So, the actual wrapper that would be generated for puts is actually this one:

void __dffi_wrapper_0(int32_t( __attribute__((cdecl)) *__FPtr)(char *), int32_t *__Ret, void** __Args) {
 *__Ret = (__FPtr)(*((char **)__Args[0]));
}

For now, all the necessary wrappers are generated when the DFFI::cdef or DFFI::compile APIs are used. The only exception where they are generated on-the-fly (when calling CompilationUnit::getFunction) is for variadic arguments. One possible evolution is to let the user chooses whether he wants this to happen on-the-fly or not for every declared function.

Issues with Clang

There is one major issue with Clang that we need to hack around in order to have the DFFI::cdef functionality: unused declarations aren't emitted by Clang (even when using -g -femit-all-decls).

Here is an example, produced from the following C code:

typedef struct {
 short a;
 int b;
} A;

void print_A(A s);

$ clang -S -emit-llvm -g -femit-all-decls -o - a.c |grep print_A |wc -l
0

The produced LLVM IR does not contain a function named print_A! The hack we temporarily use parses the clang AST and generates temporary functions that looks like this:

void __dffi_force_decl_print_A(A s) { }

This forces LLVM to generate an empty function named __dffi_force_decl_print_A with the good arguments (and associated debug informations).

This is why DragonFFI proposes another API, DFFI::compile. This API does not force declared-only functions to be present in the LLVM IR, and will only expose functions that end up naturally in the LLVM IR after optimizations.

If someone has a better idea to handle this, please let us know!

Python bindings

Python bindings were the first ones to have been written, simply because it's the "high level" language I know best. Python provides its own set of challenges, but we will save that for another blog post. These Python bindings are built using pybind11, and provides their own set of C types. Lots of example of what can be achieved can be found here and here.

Project status

DragonFFI currently supports Linux, OSX and Windows OSes, running on Intel 32 and 64-bits CPUs. Travis is used for continuous integration, and every changes is validated on all these platforms before being integrated.

The project will go from alpha to beta quality when the 0.3 version will be out (which will bring Travis and Appveyor CI integration and support for variadic functions). The project will be considered stable once these things happen:

user and developer documentations exist!
another foreign language is supported (JS? Ruby?)
the DragonFFI main library API is considered stable
a non negligible list of tests have been added
all the things in the TODO file have been done :)

Various ideas for the future

Here are various interesting ideas we have for the future. We don't know yet when they will be implemented, but we think some of them could be quite nice to have.

Parse embedded DWARF information

As the entry point of DragonFFI are DWARF informations, we could imagine parsing these debug informations from shared libraries that embed them (or provide them in a separate file). The main advantage is that all the necessary information for doing the FFI right are in one file, the header files are no longer required. The main drawback is that debug informations tend to take a lot of space (for instance, DWARF informations take 1.8Mb for libarchive 3.32 compiled in release mode, for an original binary code size of 735Kb), and this brings us to the next idea.

Lightweight debug info?

The DWARF standard allows to define lots of information, and we don't need all of them in our case. We could imagine embedding only the necessary DWARF objects, that is just the necessary types to call the exported functions of a shared library. One experiment of this is available here: https://github.com/aguinet/llvm-lightdwarf. This is an LLVM optimisation pass that is inserted at the end of the optimisation pipeline, and parse metadata to only keep the relevant one for DragonFFI. More precisely, it only keeps the dwarf metadata related to exported and visible functions, with the associated types. It also keeps debug information of global variables, even thought these ones aren't supported yet in DragonFFI. It also does some unconventional things, like replacing every file and directory by "_", to save space. "Fun" fact, to do this, it borrows some code from the LLVM bitcode "obfuscator" included in recent Apple's clang version, that is used to anonymize some information from the LLVM bitcode that is sent with tvOS/iOS applications (see http://lists.llvm.org/pipermail/llvm-dev/2016-February/095588.html for more information).

Enough talking, let's see some preliminary results (on Linux x64):

on libarchive 3.3.2, DWARF goes from 1.8Mb to 536Kb, for an original binary code size of 735Kb
on zlib 1.2.11, DWARF goes from 162Kb to 61Kb, for an original binary code size of 99Kb

The instructions to reproduce this are available in the README of the LLVM pass repository.
We can conclude that defining this "light" DWARF format could be a nice idea. One other thing that could be done is defining a new binary format, that would be thus more space-efficient, but there are drawbacks going this way:

debug informations are well supported on every platform nowadays: tools exist to parse them, embed/extract them from binary, and so on
we already got DWARD and PDB: https://xkcd.com/927/

Nevertheless, it still could be a nice experiment to try and do this, figuring out the space won and see if this is worth it!

As a final note, these two ideas would also benefit to libffi, as we could process these formats and create libffi types!

JIT code from the final language (like Python) to native function code

One advantage of embedding a full working C compiler is that we could JIT the code from the final language glue to the final C function call, and thus limit the performance impact of this glue code.
Indeed, when a call is issued from Python, the following things happen:

arguments are converted from Python to C according to the function type
the function pointer and wrapper and gathered from DragonFFI
the final call is made

All this process involves basically a loop on the types of the arguments of the called function, which contains a big switch case. This loop generates the array of void* values that represents the C arguments, which is then passed to the wrapper. We could JIT a specialised version of this loop for the function type, inline the already-compiled wrapper and apply classical optimisation on top of the resulting IR, and get a straightforward conversion code specialized for the given function type, directly from Python to C.

One idea we are exploring is combining easy::jit (hello fellow Quarkslab teammates!) with LLPE to achieve this goal.

Reducing DragonFFI library size

The DragonFFI shared library embed statically compiled versions of LLVM and Clang. The size of the final shared library is about 55Mb (stripped, under Linux x64). This is really really huge, compared for instance to the 39Kb of libffi (also stripped, Linux x64)!

Here are some idea to try and reduce this footprint:

compile DragonFFI, Clang and LLVM using (Thin) LTO, with visibility hidden for both Clang and LLVM. This could have the effect of removing code from Clang/LLVM that isn't used by DragonFFI.
make DragonFFI more modular: - one core module that only have the parts from CodeGen that deals with ABIs. If the types and function prototypes are defined "by hand" (without DFFI::cdef), that's more or less the only part that is needed (with LLVM obviously) - one optional module that includes the full clang compiler (to provide the DFFI::cdef and DFFI::compile APIs)

Even with all of this, it seems to be really hard to match the 39Kb of libffi, even if we remove the cdef/compile API from DragonFFI. As always, pick the right tool for your needs :)

Conclusion

Writing the first working version of DragonFFI has been a fun experiment, that made me discover new parts of Clang/LLVM :) The current goal is to try and achieve a first stable version (see above), and experiment with the various cited ideas.

It's a really long road, so feel free to come on #dragonffi on FreeNode for any questions/suggestions you might have, (inclusive) or if you want to contribute!

Acknowledgments

Thanks to Serge Â«sans pailleÂ» Guelton for the discussions around the Python bindings, and for helping me finding the name of the project :) (one of the most difficult task). Thanks also to him, Fernand Lone-Sang and KÃ©vin Szkudlapski for their review of this blog post!

International Women's Day: Celebrating all the women in the LLVM Community!

Thu, 08 Mar 2018 11:10:00 +0000

Today is International Women's Day! To all the women in the LLVM community, thank you for all your contributions!

The LLVM Foundation values diversity within the LLVM community and the field of compilers and tools. Our Women in Compilers and Tools program began in 2015 with a birds of a feather discussion during the US LLVM Developers' Meeting and we have been expanding it over the years.

In 2017, we were a sponsor of the Grace Hopper Conference. With the help of community members Anna Zaks and David Blaikie, the LLVM Foundation had a booth at the career fair to introduce women to LLVM and encourage them to become contributors. It was very exciting to learn that many women knew of LLVM, were using it in their classes or research, using it in their career, or were interested in learning more. We hopefully encouraged more women to get involved with LLVM, compilers, and open source.

The LLVM Foundation was also a sponsor of the Programming Language Mentoring Workshop at SPLASH 2017. Our sponsorship went towards the travel costs for many women and other minorities to attend this workshop. The workshop focused on encouraging and preparing students to enter research careers in the field of programming languages, compilers, and related fields and to provide first hand perspectives on graduate school.

We hosted our first Women in Compilers & Tools reception before the 2017 US LLVM Developers' Meeting. Anna Zaks and Alice Chan participated in a panel discussion about the challenges and experiences that they have encountered in their careers and within the open source community. The event was attended by 60 members of the LLVM community.

In 2018, we look forward to another year of expanding our program. The LLVM Foundation will again sponsor the Grace Hopper Conference and we are looking for LLVM community members to help out at the career booth (more details to come). We will be having two Women in Compilers and Tools events. The first will have a reception and panel discussion before the 2018 EuroLLVM Developers' Meeting. Get your tickets here. The second will be before the 2018 US LLVM Developers' Meeting and details will be announced in the coming months.

The LLVM Foundation thanks the LLVM community and its sponsors for supporting this work. If you want to participate in the discussion or receive notifications on events, please join the Women in Compilers and Tools mailing list.

Question for the LLVM Foundation? Email us at [email protected]">[email protected].

Clang is now used to build Chrome for Windows

Mon, 05 Mar 2018 12:46:00 +0000

As of Chrome 64, Chrome for Windows is compiled with Clang. We now use Clang to build Chrome for all platforms it runs on: macOS, iOS, Linux, Chrome OS, Android, and Windows. Windows is the platform with the second most Chrome users after Android according to statcounter, which made this switch particularly exciting.

Clang is the first-ever open-source C++ compiler thatâ€™s ABI-compatible with Microsoft Visual C++ (MSVC) â€“ meaning you can build some parts of your program (for example, system libraries) with the MSVC compiler (â€œcl.exeâ€), other parts with Clang, and when linked together (either by MSVCâ€™s linker, â€œlink.exeâ€, or LLD, the LLVM projectâ€™s linker â€“ see below) the parts will form a working program.

Note that Clang is not a replacement for Visual Studio, but an addition to it. We still use Microsoftâ€™s headers and libraries to build Chrome, we still use some SDK binaries like midl.exe and mc.exe, and many Chrome/Win developers still use the Visual Studio IDE (for both development and for debugging).

This post discusses numbers, motivation, benefits and drawbacks of using Clang instead of MSVC, how to try out Clang for Windows yourself, project history, and next steps. For more information on the technical side you can look at the slides of our 2015 LLVM conference talk, and the slides linked from there.

Numbers

This is what most people ask about first, so letâ€™s talk about it first. We think the other sections are more interesting though.

Build time

Building Chrome locally with Clang is about 15% slower than with MSVC. (Weâ€™ve heard that Windows Defender can make Clang builds a lot slower on some machines, so if youâ€™re seeing larger slowdowns, make sure to whitelist Clang in Windows Defender.) However, the way Clang emits debug info is more parallelizable and builds with a distributed build service (e.g. Goma) are hence faster.

Binary size

Chrome installer size gets smaller for 64-bit builds and slightly larger for 32-bit builds using Clang. The same difference shows in uncompressed code size for regular builds as well (see the tracking bug for Clang binary size for many numbers). However, compared to MSVC builds using link-time code generation (LTCG) and profile-guided optimization (PGO) Clang generates larger code in 64-bit for targets that use /O2 but smaller code for targets that use /Os. The installer size comparison suggests Clang's output compresses better.

Some raw numbers for versions 64.0.3278.2 (MSVC PGO) and 64.0.3278.0 (Clang). mini_installer.exe is Chromeâ€™s installer that users download, containing the LZMA-compressed code. chrome_child.dll is one of the two main dlls; it contains Blink and V8, and generally has many targets that are built with /O2. chrome.dll is the other main dll, containing the browser process code, mostly built with /Os.

	mini_installer.exe	chrome.dll	chrome_child.dll	chrome.exe
32-bit win-pgo	45.46 MB	36.47 MB	53.76 MB	1.38 MB
32-bit win-clang	45.65 MB (+0.04%)	42.56 MB (+16.7%)	62.38 MB (+16%)	1.45 MB (+5.1%)
64-bit win-pgo	49.4 MB	53.3 MB	65.6 MB	1.6 MB
64-bit win-clang	46.27 MB (-6.33%)	50.6 MB (-5.1%)	72.71 MB (+10.8%)	1.57 MB (-1.2%)

Performance

We conducted extensive A/B testing of performance. Performance telemetry numbers are about the same for MSVC-built and clang-built Chrome â€“ some metrics get better, some get worse, but all of them are within 5% of each other. The official MSVC builds used LTCG and PGO, while the Clang builds currently use neither of these. This is potential for improvement that we look forward to exploring. The PGO builds took a very long time to build due to the need for collecting profiles and then building again, and as a result, the configuration was not enabled on our performance-measurement buildbots. Now that we use Clang, the perf bots again track the configuration that we ship.

Startup performance was worse in Clang-built Chrome until we started using a link-order file â€“ a form of â€œPGO lightâ€ .

Stability

We A/B-tested stability as well and found no difference between the two build configurations.

Motivation

There were many motivating reasons for this project, the overarching theme being the benefits of using the same compiler across all of Chromeâ€™s platforms, as well as the ability to change the compiler and deploy those changes to all our developers and buildbots quickly. Hereâ€™s a non-exhaustive list of examples.

Chrome is heavily using technology thatâ€™s based on compiler instrumentation (ASan, CFI, ClusterFuzzâ€”uses ASan). Clang supports this instrumentation already, but we canâ€™t add it to MSVC. We previously used after-the-fact binary instrumentation to mitigate this a bit, but having the toolchain write the right bits in the first place is cleaner and faster.
Clang enables us to write compiler plugins that add Chromium-specific warnings and to write tooling for large-scale refactoring. Chromiumâ€™s code search can now learn to index Windows code.
Chromium is open-source, so itâ€™s nice if itâ€™s built with an open-source toolchain.
Chrome runs on 6+ platforms, and most developers are only familiar with 1-3 platforms. If your patch doesnâ€™t compile on a platform youâ€™re unfamiliar with, due to a compiler error that you canâ€™t locally reproduce on your local development machine, itâ€™ll take you a while to fix. On the other hand, if all platforms use the same compiler, if it builds on your machine then itâ€™s probably going to build on all platforms.
Using the same compiler also means that compiler-specific micro-optimizations will help on all platforms (assuming that the same -O flags are used on all platforms â€“ not yet the case in Chrome, and only on the same ISAs â€“ x86 vs ARM will stay different).
Using the same compiler enables cross-compiling â€“ developers who feel most at home on a Linux box can now work on Windows-specific code, from their Linux box (without needing to run Wine).
We can continuously build Chrome trunk with Clang trunk to find compiler regressions quickly. This allows us to update Clang every week or two. Landing a major MSVC update in Chrome usually took a year or more, with several rounds of reporting internal compiler bugs and miscompiles. The issue here isnâ€™t that MSVC is more buggy than Clang â€“ it isnâ€™t, all software is buggy â€“ but that we can continuously improve Clang due to Clang being open-source.
C++ receives major new revisions every few years. When C++11 was released, we were still using six different compilers, and enabling C++11 was difficult. With fewer compilers, this gets much easier.
We can prioritize compiler features that are important to us. For example:

Deterministic builds were important to us before they were important for the MSVC team. For example, link.exe /incremental depends on an incrementing mtime timestamp in each object file.
We could enable warnings that fired in system headers long before MSVC added support for the system header concept.
cl.exe always prints the name of the input file, so that the build system has to filter it out for quiet builds.

Of course, not all â€“ or even most â€“ of these reasons will apply to other projects.

Benefits and drawbacks of using Clang instead of Visual C++

Benefits of using Clang, if you want to try for your project:

Clang supports 64-bit inline assembly. For example, in Chrome we built libyuv (a video format conversion library) with Clang long before we built all of Chrome with it. libyuv had highly-tuned 64-bit inline assembly with performance not reachable with intrinsics, and we could just use that code on Windows.
If your project runs on multiple platforms, you can use one compiler everywhere. Building your project with several compilers is generally considered good for code health, but in Chrome we found that Clangâ€™s diagnostics found most problems and we were mostly battling compiler bugs (and if another compiler has a great new diagnostic, we can add that to Clang).
Likewise, if your project is Windows-only, you can get a second compilerâ€™s opinion on your code, and Clangâ€™s warnings might find bugs.
You can use Address Sanitizer to find memory bugs.
If you donâ€™t use LTCG and PGO, itâ€™s possible that Clang might create faster code.
Clangâ€™s diagnostics and fix-it hints.

There are also drawbacks:

Clang doesnâ€™t support C++/CX or #import â€œfoo.dllâ€.
MSVC offers paid support, Clang only gives you the code and the ability to write patches yourself (although the community is very active and helpful!).
MSVC has better documentation.
Advanced debugging features such as Edit & Continue donâ€™t work when using Clang.

How to use

If you want to give Clang for Windows a try, there are two approaches:

You could use clang-cl, a compiler driver that tries to be command-line flag compatible with cl.exe (just like Clang tries to be command-line flag compatible with gcc). The Clang user manual describes how you can tell popular Windows build systems how to call clang-cl instead of cl.exe. We used this approach in Chrome to keep the Clang/Win build working alongside the MSVC build for years, with minimal maintenance cost. You can keep using link.exe, all your current compile flags, the MSVC debugger or windbg, ETW, etc. clang-cl even writes warning messages in a format thatâ€™s compatible with cl.exe so that you can click on build error messages in Visual Studio to jump to the right file and line. Everything should just work.
Alternatively, if you have a cross-platform project and want to use gcc-style flags for your Windows build, you can pass a Windows triple (e.g. --target=x86_64-windows-msvc) to regular Clang, and it will produce MSVC-ABI-compatible output. Starting in Clang 7.0.0, due Fall 2018, Clang will also default to CodeView debug info with this triple.

Since Clangâ€™s output is ABI-compatible with MSVC, you can build parts of your project with clang and other parts with MSVC. You can also pass /fallback to clang-cl to make it call cl.exe on files it canâ€™t yet compile (this should be rare; it never happens in the Chrome build).

clang-cl accepts Microsoft language extensions needed to parse system headers but tries to emit -Wmicrosoft-foo warnings when it does so (warnings are ignored for system headers). You can choose to fix your code, or pass -Wno-microsoft-foo to Clang.

link.exe can produce regular PDB files from the CodeView information that Clang writes.

Project History

We switched chrome/mac and chrome/linux to Clang a while ago. But on Windows, Clang was still missing support for parsing many Microsoft language extensions, and it didnâ€™t have any Microsoft C++ ABI-compatible codegen at all. In 2013, we spun up a team to improve Clangâ€™s Windows support, consisting half of Chrome engineers with a compiler background and half of other toolchain people. In mid-2014, Clang could self-host on Windows. In February 2015, we had the first fallback-free build of 64-bit Chrome, in July 2015 the first fallback-free build of 32-bit Chrome (32-bit SEH was difficult). In Oct 2015, we shipped a first clang-built Chrome to the Canary channel. Since then, weâ€™ve worked on improving the size of Clangâ€™s output, improved Clangâ€™s debug information (some of it behind -instcombine-lower-dbg-declare=0 for now), and A/B-tested stability and telemetry performance metrics.

We use versions of Clang that are pinned to a recent upstream revision that we update every one to three weeks, without any local patches. All our work is done in upstream LLVM.

Mid-2015, Microsoft announced that they were building on top of our work of making Clang able to parse all the Microsoft SDK headers with clang/c2, which used the Clang frontend for parsing code, but cl.exeâ€™s codegen to generate code. Development on clang/c2 was halted again in mid-2017; it is conceivable that this was related to our improvements to MSVC-ABI-compatible Clang codegen quality. Weâ€™re thankful to Microsoft for publishing documentation on the PDB file format, answering many of our questions, fixing Clang compatibility issues in their SDKs, and for giving us publicity on their blog! Again, Clang is not a replacement for MSVC, but a complement to it.

Opera for Windows is also compiled with Clang starting in version 51.

Firefox is also looking at using clang-cl for building Firefox for Windows.

Next Steps

Just as clang-cl is a cl.exe-compatible interface for Clang, lld-link is a link.exe-compatible interface for lld, the LLVM linker. Our next step is to use lld-link as an alternative to link.exe for linking Chrome for Windows. This has many of the same advantages as clang-cl (open-source, easy to update, â€¦). Also, using clang-cl together with lld-link allows using LLVM-bitcode-based LTO (which in turn enables using CFI) and using PE/COFF extensions to speed up linking. A prerequisite for using lld-link was its ability to write PDB files.
Weâ€™re also considering using libc++ instead of the MSVC STL â€“ this allows us to instrument the standard library, which is again useful for CFI and Address Sanitizer.

In Closing

Thanks to the whole LLVM community for helping to create the first new production C++ compiler for Windows in over a decade, and the first-ever open-source C++ compiler thatâ€™s ABI-compatible with MSVC!

EuroLLVM'18 developers' meeting program

Thu, 01 Mar 2018 14:31:00 +0000

The LLVM Foundation is excited to announce the program for the EuroLLVM'18 developers' meeting (April 16 - 17 in Bristol/UK) !

Keynotes

The Cerberus Memory Object Semantics for ISO and De Facto C P. Sewell
LLVM x Blockchains = A new Ecosystem of Decentralized Applications R. Zhong

Tutorials

Pointers, Alias & ModRef Analyses A. Sbirlea, N. Lopes
Scalar Evolution - Demystified J. Absar

Talks

A Parallel IR in Real Life: Optimizing OpenMP H. Finkel, J. Doerfert, X. Tian, G. Stelle
An Introduction to AMD Optimizing C/C++ Compiler A. Team
Analysis of Executable Size Reduction by LLVM passes V. Sinha, P. Kumar, S. Jain, U. Bora, S. Purini, R. Upadrasta
Developing Kotlin/Native infrastructure with LLVM/Clang, travel notes. N. Igotti
Extending LoopVectorize to Support Outer Loop Vectorization Using VPlan D. Caballero, S. Guggilla
Finding Iterator-related Errors with Clang Static Analyzer Ã. Balogh
Finding Missed Optimizations in LLVM (and other compilers) G. Barany
Global code completion and architecture of clangd E. Liu, H. Wu, I. Biryukov, S. McCall
Hardening the Standard Library M. Clow
Implementing an LLVM based Dynamic Binary Instrumentation framework C. Hubain, C. Tessier
LLVM Greedy Register Allocator â€“ Improving Region Split Decisions M. Yatsina
MIR-Canon: Improving Code Diff Through Canonical Transformation. P. Lotfi
New PM: taming a custom pipeline of Falcon JIT F. Sergeev
Organising benchmarking LLVM-based compiler: Arm experience E. Astigeevich
Performance Analysis of Clang on DOE Proxy Apps H. Finkel, B. Homerding
Point-Free Templates A. Gozillon, P. Keir
Protecting the code: Control Flow Enforcement Technology O. Simhon

BoFs

Towards implementing #pragma STDC FENV_ACCESS U. Weigand
Build system integration for interactive tools I. Biryukov, H. Wu, E. Liu, S. McCall
Clang Static Analyzer BoF G. HorvÃ¡th
LLVM Foundation BoF LLVM Foundation Board of Directors

Student Research Competition

Compile-Time Function Call Interception to Mock Functions in C/C++ G. MÃ¡rton, Z. PorkolÃ¡b
Improved Loop Execution Modeling in the Clang Static Analyzer P. SzÃ©csi
Using LLVM in a Model Checking Workflow G. Sallai

Lightning Talks

C++ Parallel Standard Template Library support in LLVM M. Dvorskiy, J. Cownie, A. Kukanov
Can reviews become less of a bottleneck? K. Beyls
Clacc: OpenACC Support for Clang and LLVM J. Denny, S. Lee, J. Vetter
DragonFFI: Foreign Function Interface and JIT using Clang/LLVM A. Guinet
Easy::Jit: Compiler-assisted library to enable Just-In-Time compilation for C++ codes J. Fernandez, S. Guelton
Flang -- Project Update S. Scalpone
Look-Ahead SLP: Auto-vectorization in the Presence of Commutative Operations V. Porpodas, R. Rocha, L. GÃ³es
Low Cost Commercial Deployment of LLVM J. Bennett
Measuring the User Debugging Experience G. Bedwell
Measuring x86 instruction latencies with LLVM G. Chatelet, C. Courbet, B. De Backer, O. Sykora
OpenMP Accelerator Offloading with OpenCL using SPIR-V D. SchÃ¼rmann, J. Lucas, B. Juurlink
Parallware, LLVM and supercomputing M. Arenaz
Returning data-flow to asynchronous programming through static analysis M. Gilbert
RFC: A new divergence analysis for LLVM S. Moll, T. KlÃ¶ssner, S. Hack
Static Performance Analysis with LLVM C. Courbet, O. Sykora, G. Chatelet, B. De Backer
Supporting the RISC-V Vector Extensions in LLVM R. Kruppe, J. Oppermann, A. Koch
Using Clang Static Analyzer to detect Critical Control Flow S. Cook

Posters

Automatic Profiling for Climate Modeling A. Gerbes, N. Jumah, J. Kunkel
Cross Translation Unit Analysis in Clang Static Analyzer: Qualitative Evaluation on C/C++ projects G. Horvath, P. Szecsi, Z. Gera, D. Krupp
Effortless Differential Analysis of Clang Static Analyzer Changes G. HorvÃ¡th, R. KovÃ¡cs, P. SzÃ©csi
Offloading OpenMP Target Regions to FPGA Accelerators Using LLVM L. Sommer, J. Oppermann, J. Korinth, A. Koch
Using clang as a Frontend on a Formal Verification Tool M. Gadelha, J. Morse, L. Cordeiro, D. Nicole

If you are interested in any of this talks, you should register to attend the EuroLLVM'18. Tickets are limited !

More information about the EuroLLVM'18 is available here.

LLVM accepted to 2018 Google Summer of Code!

Wed, 14 Feb 2018 13:36:00 +0000

We are excited to announce the LLVM project has been accepted to 2018 Google Summer of Code!

What is Google Summer of Code?

Google Summer of Code (GSoC) is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university. There are several benefits to this program for both the students and LLVM:

Inspire students to get involved with open source, compilers and LLVM
Give students exposure to real-world software development while getting paid a stipend
Allow students to do paid work related to their academic pursuits versus getting an unrelated summer job
Bring new developers into the LLVM project
Some LLVM bugs get fixed or new features get added

Students - Apply now!

Ok, so you can't apply right now as the official application to GSoC does not open until March 12, 2018, but you must begin discussing your project on the LLVM mailing lists well before that date. There are many open projects listed on our webpage. Once you have selected a project, you will discuss it on the appropriate mailing list.

If you have an idea for a project that is not listed, you can always propose it on the list as well and seek out a mentor.

Key Dates to Remember

We have listed a few key dates here, but always consult the official GSoC timeline to confirm.

March 12 (16:00 UTC) - Applications open
March 27 (16:00 UTC) - Deadline to file your application
April 23 (16:00 UTC) - Accepted student proposals are announced
May 14 - Coding begins

LLVM Developers - Consider being mentor!

This program is not a success without our mentors. Thank you to all that have all who have already volunteered! If you have never mentored a GSoC project but are curious, it is not too late to volunteer! You can either select an open project without a mentor or propose your own. Make sure to get it listed on the webpage so that students can see it as an option.

If mentoring just isn't an option for you at this time, consider helping the project out my spreading the word about GSoC.

Questions?

If you have questions about the program for the organizers, please email [email protected]">[email protected]. Project specific questions should be sent to the appropriate developer mailing list instead.

Improving Link Time on Windows with clang-cl and lld

Mon, 08 Jan 2018 10:06:00 +0000

One of our goals in bringing clang and lld to Windows has always been to improve developer experience, and what is it that developers want the most? Faster build times! Recently, our focus has been on improving link time because it's the step that's the hardest to parallelize so we can't fall back on the time honored tradition of throwing more cores at it.

Of the various steps involved in linking, generating the debug info (which, on Windows, is a PDB file) is by far the slowest since it involves merging O(# of linker inputs) sequences of type records, most of which are duplicate anyway. For example, if two cpp files both include <string>, then both of those object files will have hundreds of duplicate type records that need to be de-duplicated during the link step. This means you have to compute O(M x N) hash values, even though only a small fraction of those ultimately contribute to the final PDB.

Several strategies have been invented to deal with this over the years and try to make linking faster. Many years ago, Microsoft introduced the notion of a Type Server (enabled via /Zi compiler option in MSVC), which moves some of the work into the compiler (to take advantage of parallelism). More recently we have been given the /DEBUG:FASTLINK linker option which attempts to solve the problem by not merging types at all in the linker. However, each of these strategies has its own set of disadvantages, and neither can be considered perfect for all use cases.

In this blog post, we'll first go over some technical background about CodeView so that we can understand the problem, followed by a summary of existing attempts to speed up type merging. Then, we'll describe a novel extension to the PE/COFF file format which speeds up linking by offloading part of the work required to de-duplicate types to the compiler and using a new algorithm which uniquely identifies type records even across input files, and discuss the various tradeoffs of each approach. Finally, we'll present some benchmarks and discuss how you can try this out in clang-cl and lld today.

Background

Consider a simple structure in C++, defined like this a header file:

struct Node {

Node *Next = nullptr;

Node *Prev = nullptr;

int Value = 0;

};

Since each compilation happens independently of every other compilation, the compiler cannot assume any other translation unit will ever emit the records necessary to describe this type. As a result, to guarantee that the type makes it into the final PDB, every compiler instance that encounters this definition must emit type information for this type. So the record will be serialized by the compiler into a series of records that looks roughly like this:

0x1004 | LF_STRUCTURE [size = 40] `Node`

unique name: `.?AUNode@@`

vtable: <none>

base list: <none>

field list: <none>

options: forward ref | has unique name

0x1005 | LF_POINTER [size = 12]

referent = 0x1004

mode = pointer

opts = None

kind = ptr32

0x1006 | LF_FIELDLIST [size = 52]

- LF_MEMBER

name = `Next`

Type = 0x1005

Offset = 0

attrs = public

- LF_MEMBER

name = `Prev`

Type = 0x1005

Offset = 4

attrs = public

- LF_MEMBER

name = `Value`

Type = 0x0074 (int)

Offset = 8

attrs = public

0x1007 | LF_STRUCTURE [size = 40] `Node`

unique name: `.?AUNode@@`

vtable: <none>

base list: <none>

field list: 0x1006

options: has unique name

The values on the left correspond to the types index in the type sequence and depend on what types have already been encountered, while other types can the refer to them (for example, referent = 0x1004) means that this record is a pointer to whatever the type at index 0x1004 was.

As a result of this design, another compilation unit which includes the same header file will need to emit this exact same type, with the only difference being the indices (since the other compilation may encounter other types before this one, causing the ordering to be different).

In short, type indices only make sense within the context of a single type sequence (i.e. compiland), but since the linker needs to see across all object files, it has to have some way of identifying whether a type from object file A is isomorphic to a different type from object file B, even if its type indices might be different numerically from any previously seen type.

This algorithm, henceforth referred to as type merging, is the primary consumer of CPU cycles during linking (measured in LLD, and estimated in MSVC linker by comparing /DEBUG:FULL vs /DEBUG:FASTLINK times), and as such it is the portion of the linking process which this blog post presents a new solution to.

Existing Solutions

Itâ€™s worthwhile to discuss some of the existing attempts to reduce the cost associated with type merging so that we can compare and contrast their various pros and cons.

Type Servers (/Zi)

The /Zi compiler option was one of the first attempts to address type merging speed, and it dates back many years. The idea behind type servers is to offload the work of de-duplication from the linking phase to the compilation phase. Most build systems already support parallel compilation, and even if they donâ€™t cl.exe supports it natively via the /MP compiler switch, so there is no roadblock to anyone taking advantage of parallel compilation.

To implement type servers, each compilation process communicates via IPC with a single process (mspdbsrv.exe) whose job is to de-duplicate type records on the fly, and when a record is isomorphic to an existing record, the type server communicates back the previously saved index, and when it is new it sends back a new index. This allows type deduplication to happen mostlyin parallel, but adding some overhead to each compilation (since there is contention over a global lock) in return for significantly reduced link times, since types will already have been merged.

Type servers bring with them some disadvantages though, so we enumerate them here:

Type servers add significant context switching and global lock contention to the compilation phase, reducing parallelism and degrading overall system performance while a build is in process. While some performance is reclaimed from the linker, some is sacrificed due to the use of a global system lock. Itâ€™s still a net win, but as it is not free, it leaves open the possibility that we may be able to achieve better parallelism using a different approach.
The type server process itself (mspdbsrv.exe) introduces a single point of failure. When it crashes (we see C1033 several times per day on Chrome, for example, which seems to indicate an mspdbsrv.exe crash) it could trigger a full rebuild if the type server PDB file is left in a corrupt state.
mspdbsrv is incompatible with distributed builds, which is a show-stopper for large applications that can take several hours to build on normal workstations. Type servers operate only via local IPC. While multi-processing works well for small applications, many large products have build farms that distribute compilations among tens or hundreds of physical machines. Type servers are incompatible with this scenario.

Fastlink PDBs

Fastlink PDBs are a relatively recent introduction, and the approach used by this solution is to eliminate type merging entirely. To support this, special metadata is set in the PDB file to indicate to the tool that this is a fastlink PDB, and when the tool (e.g. debugger) encounters this metadata, it will fetch all type information from the original object file, rather than from the PDB. As before, there are several disadvantages to this approach, enumerated here:

The pdbcopy utility is almost unusable with fastlink PDBs for performance reasons.
Since type merging doesnâ€™t happen, indexing of type information also doesnâ€™t happen (since the expensive part of building an index -- the hashing -- comes for free when you were hashing the record anyway). This leads to degradation in the debugger user experience, since waits which previously happened only at build time now happen at debug-time.
Fastlink PDBs are not portable. The PDB references the object files by path, so if you copy the PDB and object files to a different machine (or even different path on the same machine) for archival purposes, they can no longer be debugged. This is a deal-breaker for using it on production builds
Symbols canâ€™t be enumerated in a Fastlink PDB. This is most obvious if you attempt to use DIA SDK on a Fastlink PDB, where it will simply refuse to do anything at all. This means that the only externally supported way of querying debug info for users is impossible against a Fastlink PDB. Beyond that, however, it also means that even Microsoftâ€™s own tools which need to enumerate symbols cannot use any standard API for doing so. For example, WinDbg doesnâ€™t fully support Fastlink PDBs, and many workflows are broken by the use of them, even using supported Microsoft tools.
It has several serious stability issues which make it unusable on large projects [ref]. This is probably related to point 4 above, namely the fact that every tool that wants to be able to work with a Fastlink PDB needs to use different code than the SDK that has been tested and battle-hardened through years of development.
When compiling with clang-cl and linking with /debug:fastlink the compiler has to be instructed to emit additional debug information, making .obj files about 29% larger.

Clang's Solution - The COFF .debug$H section

This new approach tries to combine the ideas behind type servers and fastlink PDBs. Like type servers, it attempts to offload the work of de-duplication to the compilation phase so that it can be done in parallel. However, it does so using an algorithm with the property that the resulting hash can be used to identify a type record even across type streams. Specifically, if two records have the same hash, they are the same record even if they are from different object files. If you can take it on faith that such an algorithm exists (which will be henceforth referred to as a global hash), then the amount of work that the linker needs to perform is greatly reduced. And the work that it does still have to do can be done much quicker. Perhaps most importantly, it produces a byte-for-byte identical PDB to when the option is not used, meaning all of the issues surrounding Fastlink PDBs and compatibility are gone.

Previously, the linker would do something that looks roughly like this:

HashTable<Type> HashedTypes;

vector<Type> MergedTypes;

for (ObjectFile &Obj : Objects) {

for (Type &T : Obj.types()) {

remapAllTypeIndices(MergedTypes, T);

if (!HashedTypes.try_insert(T))

continue;

MergedTypes.push_back(T);

}

The important observations here are:

remapAllTypeIndices is called unconditionally for every type in every object file.
A hash of the type is computed unconditionally for every type
At least one full record comparison is done for every type. In practice it turns out to be much more, because hash buckets are computed modulo table size, so there will actually be 1 full record comparison for every probe.

Given a global hash function as described above, the algorithm can be re-written like this:

HashMap<SHA1, int> HashToIndex;

vector<Type> OrderedTypes;

for (ObjectFile &Obj : Objects) {

auto Hashes = Obj.DebugHSectionHashes;

for (int I=0; I < Obj.NumTypes; ++I) {

int NextIndex = OrderedTypes.size();

if (!HashToIndex.try_emplace(Hashes[I], NextIndex))

continue;

remapAllTypeIndices(T);

OrderedTypes.push_back(T);

}

While this appears very similar, its performance characteristics are quite different.

remapAllTypeIndices is only called when the record is actually new. Which, as we discussed earlier, is a small fraction of the time over many linker inputs.
A hash of the type is never computed by the linker. It is simply there in the object file (the exception to this is mixed linker inputs, discussed earlier, but those are a small fraction of input files).
Full record comparisons never happen. Since we are using a strong hash function with negligible chance of false collisions, and since the hash of a record provides equality semantics across streams, the hash is as good as the record itself.

Combining all of these points, we get an algorithm that is extremely cache friendly. Amortized over all input files, most records during type merging are cache hits (i.e. duplicate records). With this algorithm when we get a cache hit, the only two data structures that are accessed are:

An array of contiguous hash values.
An array of contiguous hash buckets.

Since we never do full equality comparison (which would blow out the L1 and sometimes even L2 cache due to the average size of a type record being larger than a cache line) the algorithm here is very fast.

Weâ€™ve deferred discussion of how to create such a hash up until now, but it is actually fairly straightforward. We use what is known as a â€œtree hashâ€ or â€œMerkle treeâ€. The idea is to pass bytes from a type record directly to the hash function up until the point we get to a type index. Then, instead of passing the numeric value of the type index to the hash function, we pass the previously computed hash of the record that is being referenced.

Such a hash is very fast to compute in the compiler because the compiler must already hash types anyway, so the incremental cost to emit this to the .debug$H section is negligible. For example, when a type is encountered in a translation unit, before you can add that type to the object fileâ€™s .debug$T section, it must first be verified that the type has not already been added. And since this is happening naturally in the order in which types are encountered, all that has to be done is to save these hash values in an array indexed by type index, and subsequent hash operations will have O(1) access to all of the information needed to compute this merkle hash.

Mixed Input Files and Compiler/Linker Compatibility

A linker must be prepared to deal with a mixed set of input files. For example, while a particular compiler may choose to always emit .debug$H sections, a linker must be prepared to link objects that for whatever reason do not have this section. To handle this, the linker can examine all inputs up front and manually compute hashes for inputs with missing .debug$Hsections. In practice this proves to be a small fraction and the penalty for doing this serially is negligible, although it should be noted that in theory this can also be done as a parallel pre-processing step if some use cases show that this has non-negligible cost.

Similarly, the emission of this section in an object file has no impact on linkers which have not been taught to use it. Since it is a purely additive (and optional) inclusion into the object file, any linker which does not understand it will continue to work exactly as it does today.

The On-Disk Format

Clang uses the following on-disk format for the .debug$H section.

0x0 : <Section Magic> (4 bytes)

0x4 : <Version> (2 bytes)

0x6 : <Hash Algorithm> (2 bytes)

0x8 : <Hash Value> (N bytes)

0x8 + N : <Hash Value> (N bytes)

â€¦

Here, â€œSection Magicâ€ is an arbitrarily chosen 4-byte number whose purpose is to provide some level of certainty that what weâ€™re seeing is a real .debug$H section, and not some section that someone created that accidentally happened to be called that. Our current implementation uses the value 0x133C9C5, which represents the date of the initial prototype implementation. But this can be any reasonable value here, as long as it never changes.

â€œVersionâ€ is reserved for future use, so that the format of the section can theoretically change.

â€œHash Algorithmâ€ is a value that indicates what algorithm was used to generate the hashes that follow. As such, the value of N above is also a function of what hash algorithm is used. Currently, the only proposed value for Hash Algorithm is SHA1 = 0, which would imply N = 20 when Hash Algorithm = 0. Should it prove useful to have truncated 8-byte SHA1 hashes, we could define SHA1_8 = 1, for example.

Limitations and Pitfalls

The biggest limitation of this format is that it increases object file size. Experiments locally on fairly large projects show an average aggregate object file size increase of ~15% compared to /DEBUG:FULL (which, for clang-cl, actually makes .debug$H object files smallerthan those needed to support /DEBUG:FASTLINK).

There is another, less obvious potential pitfall as well. The worst case scenario is when no input files have a .debug$H section present, but this limitation is the same in principle even if only a subset of files have a .debug$H section. Since the linker must agree on a single hash function for all object files, there is the question of what to do when not all object files agree on hash function, or when not all object files contain a .debug$H section. If the code is not written carefully, you could get into a situation where, for example, no input files contain a .debug$H section so the linker decides to synthesize one on the fly for every input file. Since SHA1 (for example) is quite slow, this could cause a huge performance penalty.

This limitation can be coded around with some care, however. For example, tree hashes can be computed up-front in parallel as a pre-processing step. Alternatively, a hash function could be chosen based on some heuristic estimate of what would likely lead to the fastest link (based on the percentage of inputs that had a .debug$H section, for example). There are other possibilities as well. The important thing is to just be aware of this potential pitfall, and if your links become very slow, you'll know that the first thing you should check is "do all my object files have .debug$H sections?"

Finally, since a hash is considered to be identical to the original record, we must consider the possibility of collisions. That said, this does not appear to be a serious concern in practice. A single PDB can have a theoretical maximum of 2³² type records anyway (due to a type index being 4 bytes). The following table shows the expected number of type records needed for a collision to exist as a function of hash size.

Hash Size (Bytes)	Average # of records needed for a collision
4	82,137
6	21,027,121
8	5,382,943,231
12	3.53 x 10¹⁴
16	2.31 x 10¹⁹
20	1.52 x 10²⁴

Given that this is strictly for debug information and not generated code, itâ€™s worth thinking about the severity of a collision. We feel that an 8-byte hash is probably acceptable for real world use.

Benchmarks

Here we will give some benchmarks on large real world applications (specifically, Chrome and clang). The times presented are only for the linker. gn args for each build of chromium are specified at the end..

Toolchain	Mode	Target
Toolchain	Mode	blink_core.dll	content.dll	chrome.dll	clang.exe
MSVC	/DEBUG:FULL	553.11s	205.45s	507.17s	62.45s
MSVC	/DEBUG:FASTLINK	116.77s	56.05s	67.80s	29.37s
lld-link	/DEBUG:FULL	121.17s	42.10s	42.31s	24.14s
lld-link	/DEBUG:GHASH	88.71s	33.30s	34.76s	17.99s

The numbers here indicate a reduction in link time of up to 30% by enabling /DEBUG:GHASH in lld.

It's worth mentioning that lld does not yet have support for incremental linking so we could not compare the cost of an incremental link with /DEBUG:GHASH versus MSVC. We still expect incremental linking using MSVC under optimal conditions (e.g. change whitespace in a header file) to produce much faster links than lld is currently able to do.

There are several possible avenues for further optimization though, so we will finish up by discussing them.

Further Improvements

There are several ways to improve the times further, which have yet to be explored.

Use a smaller or faster hash. We use a 20-byte SHA1 hash. This is not a multiple of cache line size, and in any case the probability of collision is astronomically small even in the largest PDBs, considering that the theoretical limit of a PDB is just under 2^32 possible unique types (due to the 4-byte size of a type index). SHA1 is also notoriously slow. It might be interesting to try, for example, a Blake2 set to output an 8 byte hash. This should give sufficiently low probability of a collision while improving cache performance. The on-disk format is designed with this flexibility in mind, as different hash algorithms can be specified in the header.
Hashes for compilands with missing .debug$H sections can be computed in parallel before linking. Currently when we encounter an object file without a .debug$H section, we must synthesize one in the linker. Our prototype algorithm does this serially for each input.
Symbol records from .debug$S sections can be merged in parallel. Currently in lld, we first merge type records into the TPI stream, then we iterate symbol records and remap types in each symbol record to correspond to the new type indices. If we merge types from all modules up front, the symbol records (with the exception of global symbols) can be merged in parallel since they get written to independent streams).

Try it out!

If you're already using clang-cl and lld on Windows today, you can try this out. There are two flags needed to enable this, one for the compiler and one for the linker:

To enable the emission of a .debug$H section by the compiler, you will need to pass the undocumented -mllvm -emit-codeview-ghash-section flag to clang-cl (this flag should go away in the future, once this is considered stable and good enough to be turned on by default).
To tell lld to use this information, you will need to pass the /DEBUG:GHASH to lld.

Note that this feature is still considered highly experimental, so we're interested in your feedback (llvm-dev@ mailing list, direct email is ok too) and bug reports (bugs.llvm.org).

Clang â™¥ bash -- better auto completion is coming to bash

Tue, 19 Sep 2017 23:37:00 +0000

Compilers are complex pieces of software and have a multitude of command-line options to fine tune parameters. Clang is no exception: it has 447 command-line options. Itâ€™s nearly impossible to memorize all these options and their correct spellings, that's where shell completion can be very handy. When you type in the first few characters of a flag and hit tab, it will autocomplete the rest for you.

Background
However, such a autocompletion feature is not available yet, as there's no easy way to get a complete list of the options Clang supports. For example, bash doesnâ€™t have any autocompletion support for Clang, and despite some shells like zsh having a script for command-line autocompletion, they use hard coded lists of command-line options, and are not automatically updated when a new option is added to Clang. These shells also canâ€™t autocomplete arguments which some flags take (-std=[tab] for instance).

This is the problem we were working to solve during this yearâ€™s Google Summer of Code. Weâ€™re adding a feature to Clang so that we can implement a complete, exact command-line option completion which is highly portable for any shell. To start with, we'll provide a completion script for bash which uses this feature.

Implementation

Clang now has a new command line option called --autocomplete. This flag receives the incomplete user input from the shell and then queries the internal data structures of the current Clang binary, and returns a list of possible completions. With this API, we can always get an accurate list of options and values any time, on any newer versions of Clang.

We built an autocompletion using this in bash for the first implementation. You can find its source code here. Also, here is the sample for Qt text entry autocompletion to give an example how to use this API from an UI application as seen below:

You can always complete one flag at a time. So if you want to use the API, you have to select the flag that the user is currently typing. Then just pass this flag to the --autocomplete flag in the selected clang binary. So in the case below all flags start with `-tr` are displayed with their descriptions behind them (separated from the flag with a tab character).

The API also supports completing the values of flags. If you have a flag for which value completion is supported, you can also provide an incomplete value behind the flag separated by a comma to get completion for this:

If you provide nothing after the comma, the list of the all possible values for this flag is displayed.

How to get it

This feature is available for use now with LLVM/clang 5.0 and weâ€™ll also be adding this feature to the standard bash completion package. Make sure you have the latest clang version on your machine, and source this script. If want to make the change permanent, just source it from your .bashrc and enjoy typing your clang invocations!

2017 US LLVM Developers' Meeting Program

Mon, 11 Sep 2017 09:00:00 +0000

The LLVM Foundation is excited to announce the selected proposals for the 2017 US LLVM Developers' Meeting!

Keynotes:

Falcon: An optimizing Java JIT - Philip Reames
Compiling Android userspace and Linux kernel with LLVM - Stephen Hines, Nick Desaulniers and Greg Hackmann

Talks:

Apple LLVM GPU Compiler: Embedded Dragons - Marcello Maggioni and Charu Chandrasekaran
Bringing link-time optimization to the embedded world: (Thin)LTO with Linker Scripts - Tobias Edler von Koch, Sergei Larin, Shankar Easwaran and Hemant Kulkarni
Advancing Clangd: Bringing persisted indexing to Clang tooling - Marc-Andre Laperle
The Further Benefits of Explicit Modularization: Modular Codegen - David Blaikie
eval() in C++ - Sean Callanan
Enabling Parallel Computing in Chapel with Clang and LLVM - Michael Ferguson
Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator - Kostya Serebryany, Vitaly Buka and Matt Morehouse
Adding Indexâ€Whileâ€Building and Refactoring to Clang - Alex Lorenz and Nathan Hawes
XRay in LLVM: Function Call Tracing and Analysis - Dean Michael Berris
GlobalISel: Past, Present, and Future - Quentin Colombet and Ahmed Bougacha
Dominator Trees and incremental updates that transcend time - Jakub Kuderski
Scale, Robust and Regression-Free Loop Optimizations for Scientific Fortran and Modern C++ - Tobias Grosser and Michael Kruse
Implementing Swift Generics - Douglas Gregor, Slava Pestov and John McCall
lld: A Fast, Simple, and Portable Linker - Rui Ueyama
Vectorizing Loops with VPlan â€“ Current State and Next Steps - Ayal Zaks and Gil Rapaport
LLVM Compile-Time: Challenges. Improvements. Outlook.
- Michael Zolotukhin
Challenges when building an LLVM bitcode Obfuscator - Serge Guelton, Adrien Guinet, Juan Manuel Martinez and Pierrick Brunet
Building Your Product Around LLVM Releases - Tom Stellard
The Type Sanitizer: Free Yourself from -fno-strict-aliasing - Hal Finkel

BoFs:

Storing Clang data for IDEs and static analysis - Marc-Andre Laperle
Source-based Code Coverage BoF - Eli Friedman and Vedant Kumar
Clang Static Analyzer BoF - Devin Coughlin, Artem Dergachev and Anna Zaks
Co-ordinating RISC-V development in LLVM - Alex Bradbury
Thoughts and State for Representing Parallelism with Minimal IR Extensions in LLVM - Xinmin Tian, Hal Finkel, Tb Schardl, Johannes Doerfert and Vikram Adve
BoF - Loop and Accelerator Compilation Using Integer Polyhedra - Tobias Grosser and Hal Finkel
LLDB Future Directions - Zachary Turner and David Blaikie
LLVM Foundation - Status and Involvement - LLVM Foundation Board of Directors

Tutorials:

Writing Great Machine Schedulers - Javed Absar and Florian Hahn
Tutorial: Head First into GlobalISel - Daniel Sanders, Aditya Nandakumar and Justin Bogner
Welcome to the back-end: The LLVM machine representation. - Matthias Braun

Lightning Talks:

Porting OpenVMS using LLVM - John Reagan
Porting LeakSanitizer: A Beginner's Guide - Francis Ricci
Introsort based sorting function for libc++ - Divya Shanmughan and Aditya Kumar
Code Size Optimization: Interprocedural Outlining at the IR Level - River Riddle
ThreadSanitizer APIs for external libraries - Kuba Mracek
A better shell command-line autocompletion for clang - Yuka Takahashi
A CMake toolkit for migrating C++ projects to clangâ€™s module system. - Raphael Isemann
Debugging of optimized code: Extending the lifetime of local variables - Wolfgang Pieb
An LLVM based Loop Profiler - Shalini Jain, Kamlesh Kumar, Suresh Purini, Dibyendu Das and Ramakrishna Upadrasta
Compiling cross-toolchains with CMake and runtimes build - Petr Hosek

Student Research Competition:

VPlan + RV: A Proposal - Simon Moll and Sebastian Hack
Polyhedral Value & Memory Analysis - Johannes Doerfert and Sebastian Hack
DLVM: A Compiler Framework for Deep Learning DSLs - Richard Wei, Vikram Adve and Lane Schwartz
Leveraging LLVM to Optimize Parallel Programs - William Moses
Exploiting and improving LLVM's data flow analysis using superoptimizer - Jubi Taneja and John Regehr

Posters:

Venerable Variadic Vulnerabilities Vanquished - Priyam Biswas, Alessandro Di Federico, Scott A. Carr, Prabhu Rajasekaran, Stijn Volckaert, Yeoul Na, Michael Franz and Mathias Payer
Extending LLVMâ€™s masked.gather/scatter Intrinsic to Read/write Contiguous Chunks from/to Arbitrary Locations. - Farhana Aleen, Elena Demikhovsky, Hideki Saito, and David Kreitzer
An LLVM based Loop Profiler - Shalini Jain, Kamlesh Kumar, Suresh Purini, Dibyendu Das and Ramakrishna Upadrasta
Leveraging Compiler Optimizations to Reduce Runtime Fault Recovery Overhead - Fateme S. Hosseini, Pouya Fotouhi, Chengmo Yang and Guang R. Gao
Polyhedral Optimizations and transparent GPU offloading for Julia by Polly - Sanjay Srivallabh Singapuram
Improving debug information in LLVM to recover optimized-out function parameters - Ananth Sowda and Ivan Baev
Adding Debug Information and Merge Attribute to Merge-Functions LLVM passes - Anmol Paralkar
ALLVM: LLVM All the Things! - Will Dietz and Vikram Adve
Project Sulong - Executing LLVM IR on top of a JVM - Matthias Grimmer and Christian Wimmer
JIT Fuzzing Solver: A LibFuzzer based constraint solver - Daniel Liew, Cristian Cadar and Alastair Donaldson
Non-determinism in LLVM Code Generation - Mandeep Singh Grang

If you are interested in any of these talks, you should register to attend the 2017 US LLVM Developers' Meeting! Tickets are limited, so register now!

LLVM on Windows now supports PDB Debug Info

Fri, 18 Aug 2017 12:55:00 +0000

For several years, weâ€™ve been hard at work on making clang a world class toolchain for developing software on Windows. Weâ€™ve written about this several times in the past, and weâ€™ve had full ABI compatibility (minus bugs) for some time. One area that been notoriously hard to achieve compatibility on has been debug information, but over the past 2 years weâ€™ve made significant leaps. If you just want the TL;DR, then here you go: If youâ€™re using clang on Windows, you can now get PDB debug information!

Background: CodeView vs. PDB

CodeView is a debug information format invented by Microsoft in the mid 1980s. For various reasons, other debuggers developed an independent format called DWARF, which eventually became standardized and is now widely supported by many compilers and programming languages. CodeView, like DWARF, defines a set of records that describe mappings between source lines and code addresses, as well as types and symbols that your program uses. The debugger then uses this information to let you set breakpoints by function name, display the value of a variable, etc. But CodeView is only somewhat documented, with the most recent official documentation being at least 20 years old. While some records still have the format documented above, others have evolved, and entirely new records have been introduced that are not documented anywhere.

Itâ€™s important to understand though that CodeView is just a collection of records. What happens when the user says â€œshow me the value of Fooâ€? The debugger has to find the record that describes Foo. And now things start getting complicated. What optimizations are enabled? What version of the compiler was used? (These could be important if there are certain ABI incompatibilities between different versions of the compiler, or as a hint when trying to reconstruct a backtrace in heavily optimized code, or if the stack has been smashed). There are a billion other symbols in the program, how can we find the one named Foo without doing an exhaustive O(n) search? How can we support incremental linking so that it doesnâ€™t take a long time to re-generate debug info when only a small amount of code has actually changed? How can we save space by de-duplicating strings that are used repeatedly? Enter PDB.

PDB (Program Database) is, as you might have guessed from the name, a database. It contains CodeView but it also contains many other things that allow indexing of the CodeView records in various ways. This allows for fast lookups of types and symbols by name or address, the philosophical equivalent of â€œtablesâ€ for individual input files, and various other things that are mostly invisible to you as a user but largely responsible for making the debugging experience on Windows so great. But thereâ€™s a problem: While CodeView is at least kind-of documented, PDB is completely undocumented. And itâ€™s highly non-trivial.

Weâ€™re Stuck (Or Are We?)

Several years ago, we decided that the path forward was to abandon any hope of emitting CodeView and PDB, and instead focus on two things:

Make clang-cl emit DWARF debug information on Windows
Port LLDB to Windows and teach it about the Windows ABI, which would be significantly easier than teaching Visual Studio and/or WinDbg to be able to interpret DWARF (assuming this is even possible at all, given that everything would have to be done strictly through the Visual Studio / WinDbg extensibility model)

In fact, I even wrote another blog post about this very topic a little over 2 years ago. So I got it to work, and I eventually got parts of LLDB working on Windows for simple debugging scenarios.

Unfortunately, it was beginning to become clear that we really needed PDB. Our goal has always been to create as little friction as possible for developers who are embedded in the Windows ecosystem. Tools like Windows Performance Analyzer and vTune are very powerful and standard tools in engineersâ€™ existing repertoires. Organizations already have infrastructure in place to archive PDB files, and collect & analyze crash dumps. Debugging with PDB is extremely responsive given that the debugger does not have to index symbols upon startup, since the indices are built into the file format. And last but not least, tools such as WinDbg are already great for post-mortem debugging, and frankly many (perhaps even most) Windows developers will only give up the Visual Studio debugger when it is pried from their cold dead hands.

I got some odd stares (to put it lightly) when I suggested that we just ask Microsoft if they would help us out. But ultimately we did, andâ€¦ they agreed! This came in the form of some code uploaded to the Microsoft Github repo which we were on our own to figure out. Although they were only able to upload a subset of their PDB code (meaning we had to do a lot of guessing and exploration, and the code didnâ€™t compile either since half of it was missing), it filled in enough blanks that we were able to do the rest.

After about a year and a half of studying this code, hacking away, studying the code some more, hacking away some more, etc, Iâ€™m proud to say that lld (the LLVM linker) can finally emit working PDBs. All the basics like setting breakpoints by line, or by name, or viewing variables, or searching for symbols or types, everything works (minus bugs, of course).

For those of you who are interested in digging into the internals of a PDB, we also have been developing a tool for expressly this purpose. Itâ€™s called llvm-pdbutil and is the spiritual counterpart to Microsoftâ€™s own cvdump utility. It can dump the internals of a PDB, convert a PDB to yaml and vice versa, find differences between two PDBs, and much more. Brief documentation for llvm-pdbutil is here, and a detailed description of the PDB file format internals are here, consisting of everything weâ€™ve learned over the past 2 years (still a work in progress, as I have to divide my time between writing the documentation and actually making PDBs work).

Bring on the Bugs!

So this is where you come in. Weâ€™ve tested simple debugging scenarios with our PDBs, but we still consider this alpha in terms of debug info quality. Weâ€™d love for you to try it out and report issues on our bug tracker. To get you started, download the latest snapshot of clang for Windows. Here are two simple ways to test out this new functionality:

Have clang-cl invoke lld automatically

clang-cl -fuse-ld=lld -Z7 -MTd hello.cpp

Invoke clang-cl and lld separately.

clang-cl -c -Z7 -MTd -o hello.obj hello.cpp
lld-link -debug hello.obj

We look forward to the onslaught of bug reports!

We would like to extend a very sincere and deep thanks to Microsoft for their help in getting the code uploaded to the github repository, as we would never have gotten this far without it.

And to leave you with something to get you even more excited for the future, it's worth reiterating that all of this is done without a dependency on any windows specific api, dll, or library. It's 100% portable. Do I hear cross-compilation?

Zach Turner (on behalf of the the LLVM Windows Team)

Devirtualization in LLVM and Clang

Fri, 10 Mar 2017 13:23:00 +0000

This blogpost will show how C++ devirtualization is performed in current (4.0) clang and LLVM and also ongoing work on -fstrict-vtable-pointers features.

Devirtualization done by the frontend

In order to transform a virtual call into a direct call, the frontend must be sure that there are no overrides of vfunction in the program or know the dynamic type of object. Compilation proceeds one translation unit at a time, so, barring LTO, there are only a few cases when the compiler may conclude that there are no overrides:

either the class or virtual method is marked as final
the class is defined in an anonymous namespace and has no deriving classes in its translation unit

The latter is more tricky for clang, which translates the source code in chunks on the fly (see: ASTProducer and ASTConsumer), so is not able to determine if there are any deriving classes later in the source. This could be dealt with in a couple of ways:

give up immediate generation
run data flow analysis in LLVM to find all the dynamic types passed to function, which has static linkage
hope that every use of the virtual function, which is necessarily in the same translation unit, will be inlined by LLVM -- static linkage increases the chances of inlining

Store to load propagation in LLVM

In order to devirtualize a virtual call we need:

value of vptr - which virtual table is pointed by it
value of vtable slot - which exact virtual function it is

Because vtables are constant, the latter value is much easier to get when we have the value of vptr. The only thing we need is vtable definition, which can be achieved by using available_externally linkage.

In order to figure out the vptr value, we have to find the store to the same location that defines it. There are 2 analysis responsible for it:

MemDep (Memory Dependence Analysis) is a simple linear algorithm that for each quered instruction iterates through all instructions above and stops when first dependency is found. Because queries might be performed for each instruction we end up with a quadratic algorithm. Of course quadratic algorithms are not welcome in compilers, so MemDep can only check certain number of instructions.
Memory SSA on the other hand has constant complexity because of caching. To find out more, watch â€œMemory SSA in 5minutesâ€ (https://www.youtube.com/watch?v=bdxWmryoHak). MemSSA is a pretty new analysis and it doesnâ€™t have all the features MemDep has, therefore MemDep is still widely used.

The LLVM main pass that does store to load propagation is GVN - Global Value Numbering.

Finding vptr store

In order to figure out the vptr value, we need to see store from constructor. To not rely on constructor's availability or inlining, we decided to use the @llvm.assume intrinsic to indicate the value of vptr. Assume is akin to assert - optimizer seeing call to @llvm.assume(i1 %b) can assume that %b is true after it. We can indicate vptr value by comparing it with the vtable and then call the @llvm.assume with the result of this comparison.

call void @_ZN1AC1Ev(%struct.A* %a) ; call ctor
%3 = load {...} %a ; Load vptr
%4 = icmp eq %3, @_ZTV1A ; compare vptr with vtable
call void @llvm.assume(i1 %4)

Calling multiple virtual functions

A non-inlined virtual call will clobber the vptr. In other words, optimizer will have to assume that vfunction might change the vptr in passed object. This sounds like something that never happens because vptr is â€œconstâ€. The truth is that it is actually weaker than C++ const member, because it changes multiple times during construction of an object (every base type constructor or destructor must set vptrs). But vptr can't be changed during a virtual call, right? Well, what about that?

void A::foo() { // virtual
static_assert(sizeof(A) == sizeof(Derived));
new(this) Derived;
}

This is call of placement new operator - it doesnâ€™t allocate new memory, it just creates a new object in the provided location. So, by constructing a Derived object in the place where an object of type A was living, we change the vptr to point to Derivedâ€™s vtable. Is this code even legal? C++ Standard says yes.

However it turns out that if someone called foo 2 times (with the same object), the second call would be undefined behavior. Standard pretty much says that call or dereference of a pointer to an object whose lifetime has ended is UB, and because the standard agrees that nuking object from inside ends its lifetime, the second call is UB. Be aware that this is only because a zombie pointer is used for the second call. The pointer returned by placement new is considered alive, so performing calls on that pointer is valid. Note that we also silently used that fact with the use of assume.

(un)clobbering vptr

We need to somehow say that vptr is invariant during its lifetime. We decided to introduce a new metadata for that purpose - !invariant.group. The presence of the invariant.group metadata on the load/store tells the optimizer that every load and store to the same pointer operand within the same invariant group can be assumed to load or store the same value. With -fstrict-vtable-pointers Clang decorates vtable loads with invariant.group metadana coresponding to caller pointer type.

We can enhance the load of virtual function (second load) by decorating it with !invariant.load, which is equivalent of saying â€œload from this location is always the sameâ€, which is true because vtables never changes. This way we donâ€™t rely on having the definition of vtable.

Call like:

void g(A a) {
a->foo();
a->foo();
}

Will be translated to:

define void @function(%struct.A %a) {
%1 = load {...} %a, !invariant.group !0
%2 = load {...} %1, !invariant.load !1
call void %2(%struct.A* %a)

%3 = load {...} %a, !invariant.group !0
%4 = load {...} %4, !invariant.load !1
call void %4(%struct.A* %a)
ret void
}

!0 = !{!"_ZTS1A"} ; mangled type name of A
!1 = !{}

And now by magic of GVN and MemDep:

define void @function(%struct.A* %a) {
%1 = load {...} %a, !invariant.group !0
%2 = load {...} %1, !invariant.load !1
call void %2(%struct.A* %a)
call void %2(%struct.A* %a)
ret void
}

With this, llvm-4.0 is be able to devirtualize function calls inside loops.

Barriers

In order to prevent the middle-end from finding load/store with the same !invariant.group metadata, that would come from construction/destruction of dead dynamic object, @llvm.invariant.group.barrier was introduced. It returns another pointer that aliases its argument but is considered different for the purposes of load/store invariant.group metadata. Optimizer wonâ€™t be able to figure out that returned pointer is the same because intrinsics donâ€™t have a definition. Barrier must be inserted in all the places where the dynamic object changes:

constructors
destructors
placement new of dynamic object

Dealing with barriers

Barriers hinder some other optimizations. Some ideas how it could be fixed:

stripping invariant.group metadata and barriers just after devirtualization. Currently it is done before codegen. The problem is that most of the devirtualization comes from GVN, which also does most of the optimizations we would miss with barriers. GVN is expensive therefore it is run only once. It also might make less sense if we are in LTO mode, because that would limit the devirtualization in the link phase.
teaching important passes to look through the barrier. This might be very tricky to preserve the semantics of barrier, but e.g. looking for dependency of load without invariant.group by jumping through the barrier to find a store without invariant.group, is likely to do the trick.
removing invariant.barrier when its argument comes from alloca and is never used etc.

To find out more details about devirtualization check my talk (http://llvm.org/devmtg/2016-11/#talk6) from LLVM Dev Meeting 2016.

About author

Undergraduate student at University of Warsaw, currently working on C++ static analysis in IIIT.

Some news about apt.llvm.org

Mon, 06 Mar 2017 12:59:00 +0000

apt.llvm.org provides Debian and Ubuntu repositories for every maintained version of these distributions. LLVM, Clang, clang extra tools, compiler-rt, polly, LLDB and LLD packages are generated for the stable, stabilization and development branches.

As it seems that we have more and more users of these packages, I would like to share an update about various recent changes.

New features

LLD
First, the cool new stuff : lld is now proposed and built for i386/amd64 on all Debian and Ubuntu supported versions. The test suite is also executed and the coverage results are great.

4.0
Then, following the branching for the 4.0 release, I created new repositories to propose this release.
For example, for Debian stable, just add the following in /etc/apt/sources.list.d/llvm.list

deb http://apt.llvm.org/jessie/ llvm-toolchain-jessie-4.0 main
deb-src http://apt.llvm.org/jessie/ llvm-toolchain-jessie main

llvm-defaults
Obviously, the trunk is now 5.0. If llvm-defaults is used, clang, lldb and other meta packages will be automatically updated to this version.
As a consequence and also because the branches are dead, 3.7 and 3.8 jobs have been disabled. Please note that both repositories are still available on apt.llvm.org and won't be removed.

Zesty: New Ubuntu
Packages for the next Ubuntu 17.04 (zesty) are also generated for 3.9, 4.0 and 5.0.

libfuzzer
It has been implemented a few months ago but not clearly communicated. libfuzzer has also its own packages: libfuzzer-X.Y-dev (example: libfuzzer-3.9-dev, libfuzzer-4.0-dev or libfuzzer-5.0-dev).

Changes in the infrastructure

In order to support the load, I started to use new blades that Google (thanks again to Nick Lewycky) sponsored for an initiative that I was running for Debian and IRILL. The 6 new blades removed all the wait time. With a new salt configuration, I automated the deployment of the slaves. In case the load increases again, we will have access to more blades.

I also took the time to fix some long ongoing issues:

all repositories are signed and verified that they are
i386 and amd64 packages are now uploaded at once instead of being uploaded separately. This was causing checksum error when one of the two architectures built correctly and the second was failing (ex: test failing)

Last but not least, the code coverage results are produced in a more reliable manner.

More information about the implementation and services.

As what is shipped on apt.llvm.org is exactly the same as in Debian and Ubuntu, packaging files are stored on the Debian subversion server.

A Jenkins instance is in charge of the orchestration of the whole build infrastructure.

The trunk packages are built twice a day for every Debian and Ubuntu packages. Branches (3.9 and 4.0 currently) are rebuilt only when the - trigger job found a change.

In both case, the Jenkins source job will checkout the Debian SVN branches for their version, checkout/update LLVM/clang/etc repositories and repack everything to create the source tarballs and Debian files (dsc, etc).The completion of job will trigger the binaries job to start. These jobs, thanks to Debian Jenkins glue will create or update Debian/Ubuntu versions.

Then builds are done the usual way through pbuilder for both i386 and amd64. All the test suites are going to be executed. If any LLVM test is failing on i386 or amd64, the whole build will fail. If both builds and the LLVM testsuite are successful, the sync job will start and rsync packages to the LLVM server to be replicated on the CDN. If one or both builds fail, a notification is sent to the administrator.

Some Debian static analysis (lintian) are executed on the packages to prevent some packaging errors. From time to time, some interesting issues are found.

In parallel, some binary builds have some special hooks like Coverity, code coverage or installation of more recent versions of gcc for Ubuntu precise.

Report bugs

Bugs can be reported on the bugzilla of the LLVM project in the product "Packaging" and the component "deb packages".

Common issues

Because packaging quickly moving projects like LLVM or clang, in some cases, this can be challenging to follow the rhythm in particular with regard to tests. For Debian unstable or the latest version of Ubuntu, the matrix is complexified by new versions of the basic pieces of the operating system like gcc/g++ or libtstdc++.

This is also not uncommon that some tests are being ignored in the process.

How to help

Some new comers bugs are available. As an example:

Move the compiler-rt libraries into a specific packages to simplify their usage by other packages
Ship libc++ as part of llvm-toolchain packages
Ship openmp library as part of llvm-toolchain packages
Full bootstrap of llvm-toolchain (use a clang built during the process)
...

Related to all this, a Google Summer of Code 2017 under the LLVM umbrella has been proposed: Integrate libc++ and OpenMP in apt.llvm.org

Help is also needed to keep track of the new test failures and get them fixed upstream. For example, a few tests have been marked as expected to fail to avoid crashes.

2016 LLVM Developers' Meeting - Experience from Johannes Doerfert, Travel Grant Recipient

Tue, 21 Feb 2017 23:26:00 +0000

This blog post is part of a series of blog posts from students who were funded by the LLVM Foundation to attend the 2016 LLVM Developers' Meeting in San Jose, CA. Please visit the LLVM Foundation's webpage for more information on our Travel Grants program.

This post is from Johannes Doerfert:
2016 was my third time attending the US LLVM developers meeting and for the third year in a row I was impressed by the quality of the talks, the organization and the diversity of attendees. The hands on experiences that are presented, combined with innovative ideas and cutting edge research makes it a perfect venue for me as a PhD student. The honest interest in the presented topics and the lively discussions that include students, professors and industry people are two of the many things that I experienced the strongest at these developer meetings.

For the last two years I was mainly attending as a Polly developer that talked about new features and possible applications of Polly. This year however my roles were different. First, I was attending as part of the organization team of the European LLVM developers meeting 2017 [0] together with my colleagues Tina Jung and Simon Moll. In this capacity I answered questions about the venue (Saarbruecken, Germany [1,2]) and the alterations in contrast to prior meetings. Though, more importantly, I advertised the meeting to core developers that usually do not attend the European version. Second on my agenda was the BoF on a parallel extension to the LLVM-IR which I organized with Simon Moll. In this BoF, but also during the preparation discussion on the mailing list [3], we tried to collect motivating examples, requirements as well as known challenges for a parallel extension to LLVM. These insights will be used to draft a proposal that can be discussed in the community.

Finally, I attended as a 4th year PhD student who is interested in contributing his work to the LLVM project (not only Polly). As my current research required a flexible polyhedral value (and iterationspace) analysis, I used the opportunity to implement one with aninterface similar to scalar evolution. The feedback I received on this topic was strictly positive. I will soon post a first version of this standalone analysis and start a public discussion. Since I hope to finish my studies at some (not too distant) point in time, I seized the opportunity to inquire about potential options for the time after my PhD.

As a final note I would like to thank the LLVM Foundation for their student travel grant that allowed me to attend the meeting in the first place.

[0] http://llvm.org/devmtg/2017-03/
[1] http://sic.saarland/
[2] https://en.wikipedia.org/wiki/Saarbr%C3%BCcken
[3] http://lists.llvm.org/pipermail/llvm-dev/2016-October/106051.html

LLVM's New Versioning Scheme

Wed, 14 Dec 2016 13:25:00 +0000

Historically, LLVM's major releases always added "0.1" to the version number, producing major versions like 3.8, 3.9, and 4.0 (expected by March 2017). With our next release though, we're changing this. The LLVM version number will now increase by "1.0" with every major release, which means that the first major release after LLVM 4.0 will be LLVM 5.0 (expected September 2017).
We believe that this approach will provide a simpler and more standard approach to versioning.
LLVMâ€™s version number (also shared by many of its sub-projects, such as Clang, LLD, etc.) consists of three parts: major.minor.patch. The community produces a new release every six months, with "patch" releases (also known as "dot" or "stable" releases) containing bug fixes in between.
Until now, the six-monthly releases would cause the minor component of the version to be incremented. Every five years, after minor reached 9, a more major release would occur, including some breaking changes: 2.0 introduced the bitcode format, 3.0 a type system rewrite.
During the discussions about what to call the release after 3.9, it was pointed out that since our releases are time-based rather than feature-based, the distinction between major and minor releases seems arbitrary. Further, every release is also API breaking, so by the principles of semantic versioning, we should be incrementing the major version number.
We decided that going forward, every release on the six-month cycle will be a major release. Patch releases will increment the patch component as before (producing versions like 5.0.1), and the minor component will stay at zero since no minor releases will be made.

Bitcode Compatibility

Before LLVM 4.0.0, the Developer Policy specified that bitcode produced by LLVM would be readable by the next versions up to and including the next major release. The new version of the Developer Policy instead specifies that LLVM will currently load any bitcode produced by version 3.0 or later. When developers decide to drop support for some old bitcode feature, the policy will be updated.

API Compatibility

Nothing has changed. As before, patch releases are API and ABI compatible with the main releases, and the C API is "best effort" for stability, but besides that, LLVMâ€™s API changes between releases.

What About the Minor Version?

Since the minor version is expected to always be zero, why not drop it and just use major.patch as the version number?
Dropping the minor component from the middle of the version string would introduce ambiguity: whether to interpret x.y as major.minor or major.patch would then depend on the value of x.
The version numbers are also exposed through various APIs, such as LLVM's llvm-config.h and Clang's __clang_minor__ preprocessor macro. Removing the minor component from these APIs would break a lot of existing code.
Going forward, since the minor number will be zero and patch releases are compatible, I expect we will generally refer to versions simply by their major number and treat the rest of the version string as details (just as Chromium 55 might really be 55.0.2883.76). Future versions of LLVM and Clang can generally be referred to simply as "LLVM 4" or "Clang 5".

Announcing the next LLVM Foundation Board of Directors

Mon, 12 Sep 2016 09:55:00 +0000

The LLVM Foundation is pleased to announce its new Board of Directors:

Chandler Carruth

Hal Finkel

Arnaud de Grandmaison

David Kipping

Anton Korobeynikov

Tanya Lattner

Chris Lattner

John Regehr

Three new members and five continuing members were elected to the eight person board. The new board consists of individuals from corporations and from the academic and scientific communities. They also represent various geographical groups of the LLVM community. All board members are dedicated and passionate about the programs of the LLVM Foundation and growing and supporting the LLVM community.

When voting on new board members, we took into consideration all contributions (past and present) and current involvement in the LLVM community. We also tried to create a balanced board of individuals from a wide range of backgrounds and locations to provide a voice to as many groups within the LLVM community.

We want to thank everyone who applied as we had many strong applications. As the programs of the LLVM Foundation grow we will be relying on volunteers to help us reach success. Please join our mailing list to be informed of volunteer opportunities.

About the board of directors (listed alphabetically by last name):

Chandler works at Google Inc. as a technical lead for their C++ developer platform and has served on the LLVM Foundation board of directors for the last 2 years.

Hal Finkel has been an active contributor to the LLVM project since 2011. He is the code owner for the PowerPC target, alias-analysis infrastructure, loop re-roller and the basic-block vectorizer.

In addition to his numerous technical contributions, Hal has chaired the LLVM in HPC workshop, which is held in conjunction with Super Computing (SC), for the last 3 years. This workshop provides a venue for the presentation of peer-reviewed HPC-related researching LLVM from both industry and academia. He has also been involved in organizing an LLVM-themed BoF session at SC and LLVM socials in Austin.

Hal is Lead for Compiler Technology and Programming Languages at Argonne National Laboratoryâ€™s Leadership Computing Facility.

Arnaud de Grandmaison has been hacking on LLVM projects since 2008. In addition to his open source contributions, he has worked for many years on private out-of-tree LLVM-based projects at Parrot, DiBcom, or ARM. He has also been a leader in the European LLVM community by organizing the EuroLLVM Developersâ€™ meeting, Paris socials, and chaired or participated in numerous program committees for the LLVM Developersâ€™ Meetings and other LLVM related conferences.

Arnaud is a Principal Engineer at ARM.

David Kipping has been involved with the LLVM project since 2010. He has been a key organizer and supporter of many LLVM community events such as the US and European LLVM Developersâ€™ Meetings. He has served on many of the program committees for these events.

David has worked hard to advance the adoption of LLVM at Qualcomm and other companies. One such example of his efforts is the LLVM track he created at the 2011 Linux Collaboration summit. He has over 30 years experience in open source and developer tools including working on C++ at Borland.

David has served on the board of directors for the last 2 years and has held the officer position of treasurer. The treasurer is a time demanding position in that he supports the day to day operation of the foundation, balancing the books, and generates monthly treasurer reports.

David is Director of Product Management at Qualcomm and has served on the LLVM Foundation board of directors for the last 2 years

In addition to his technical contributions, Anton has maintained LLVMâ€™s participation in Google Summer of Code by managing applications, deadlines, and overall organization. He also supports the LLVM infrastructure and has been on numerous program committees for the LLVM Developersâ€™ Meetings (both US and EuroLLVM).

Anton is currently an associate professor at the Saint Petersburg State University and has served on the LLVM Foundation board of directors for the last 2 years.

With the support of the initial board of directors, Tanya created the LLVM Foundation, defined its charitable and education mission, and worked to get 501(c)(3) status.

Tanya is the Chief Operating Officer and has served as the President of the LLVM Foundation board for the last 2 years.

Chris has attended every LLVM Developersâ€™ meeting, and presented at the majority. He helped drive the conception and incorporation of the LLVM Foundation, and has served as Secretary of the board for the last 2 years. Chris also grants commit access to the LLVM Project, moderates mailing lists, moderates and edits the LLVM blog, and drives important non-technical discussions and policy decisions related to the LLVM project.

Chris manages the Developer Tools department at Apple Inc and has served on the LLVM Foundation board of directors for the last 2 years.

LLVM Weekly - #130, Jun 27th 2016

Mon, 27 Jun 2016 08:00:00 +0000

Welcome to the one hundred and thirtieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to [email protected]">[email protected], or @llvmweekly or @asbradbury on Twitter.
If you're reading this on blog.llvm.org then do note this is LAST TIME it will be cross-posted there directly. There is a great effort underway to increase the content on the LLVM blog, and unfortunately LLVM Weekly has the effect of drowning out this content. As ever, you can head to http://llvmweekly.org, subscribe to get it by email, or subscribe to the RSS feed.
The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

After recently being taken down due to excessive resource usage, the LLVM apt repositories are now back.
A detailed introduction to ThinLTO has been published on the LLVM blog. This covers the background, design, current status, and usage information for ThinLTO.
A post on Reddit gives a summary of notable language features voted into the C++17 working draft at the Oulu meeting.

On the mailing lists

Sanjoy Das has written an RFC on strong GC references in LLVM. The motivating case for this proposal is supporting a precise, relocating garbage collector.
LLVM version 3.8.1-final has been tagged.
The Google Summer of Code mid-terms have snuck up on us already. The participating students have posted to the mailing lists with a summary of their work so far:
Vivek Pandya wrote the mailing list seeking advice on adding a new calling convention for interprocedural register allocation. Matthias Braun summarised some follow-up discussion.

LLVM commits

The new representation for control-flow integrity and virtual call metadata has landed. The commit message further details the problems this change addresses. r273729.
The llvm.type.checked.load intrinsic was added. It loads a function pointer from a virtual table pointer using type metadata. r273576.
As part of the work on CFL-AA, interprocedural function summaries were added. These avoid recomputation for many properties of a function. r273219, r273596.
MemorySSA gained new APIs for PHI creation and MemoryAccess creation. r273295.
Metadata attachments are now allowed for declarations. r273336.
A new runtimes directory was added to the LLVM tree. r273620.
LLVM's dynamic loader gained basic support for COFF ARM. r273682.

Clang commits

constexpr if support has been added to Clang. r273602.
clang-tidy has a new modernize-use-emplace check that will replace calls of push_back to emplace_back. r273275.
The CMake build system for Clang gained a ENABLE_X86_RELAX_RELOCATIONS option. r273224.

Other project commits

Basic support for versioned symbols was added to LLD. r273143.
LLD now handles both single and double dashes for all options. r273256.

ThinLTO: Scalable and Incremental LTO

Tue, 21 Jun 2016 06:01:00 +0000

ThinLTO was first introduced at EuroLLVM in 2015, with results shown from a prototype implementation within clang and LLVM. Since then, the design was reviewed through several RFCs, it has been implemented in LLVM (for gold and libLTO), and tuning is ongoing. Results already show good performance for a number of benchmarks, with compile time close to a non-LTO build.

This blog post covers the background, design, current status and usage information.

This post was written by Teresa Johnson, Mehdi Amini and David Li.

LTO Background and Motivation

LTO (Link Time Optimization) is a method for achieving better runtime performance through whole-program analysis and cross-module optimization. During the compile phase, clang will emit LLVM bitcode instead of an object file. The linker recognizes these bitcode files and invokes LLVM during the link to generate the final objects that will constitute the executable. The LLVM implementation loads all input bitcode files and merges them together to produce a single Module. The interprocedural analyses (IPA) as well as the interprocedural optimizations (IPO) are performed serially on this monolithic Module.

What this means in practice is that LTO often requires a large amount of memory (to hold all IR at once) and is very slow. And with debug information enabled via -g, the size of the IR and the resulting memory requirements are significantly larger. Even without debug information, this is prohibitive for very large applications, or when compiling on memory-constrained machines. It also makes incremental builds less effective, as everything from the LTO step on must be re-executed when any input source changes.

ThinLTO Design

ThinLTO is a new approach that is designed to scale like a non-LTO build, while retaining most of the performance achievement of full LTO.

In ThinLTO, the serial step is very thin and fast. This is because instead of loading the bitcode and merging a single monolithic module to perform these analyses, it utilizes compact summaries of each module for global analyses in the serial link step, as well as an index of function locations for later cross module importing. The function importing and other IPO transformations are performed later when the modules are optimized in fully parallel backends.

The key transformation enabled by ThinLTO global analyses is function importing, in which only those functions likely to be inlined are imported into each module. This minimizes the memory overhead in each ThinLTO backend, while maximizing the most impactful cross module optimization opportunities. The IPO transformations are therefore performed on each module extended with its imported functions.

The ThinLTO process is divided into 3 phases:

Compile: Generate IR as with full LTO mode, but extended with module summaries
Thin Link: Thin linker plugin layer to combine summaries and perform global analyses
ThinLTO backend: Parallel backends with summary-based importing and optimizations

By default, linkers that support ThinLTO (see below) are set up to launch the ThinLTO backends in threads. So the distinction between the second and third phases is transparent to the user.

The key enabler for this process are the summaries emitted during phase 1. These summaries are emitted using the bitcode format, but designed so that they can be separately loaded without involving an LLVMContext or any other expensive construction. Each global variable and function has an entry in the module summary. An entry contains metadata that abstracts the symbol it is describing. For example, a function is abstracted with its linkage type, the number of instructions it contains, and optional profiling information (PGO). Additionally, every reference (address taken, direct call) to another global is recorded. This information enables building a complete reference graph during the Thin Link phase, and subsequent fast analyses using the global summary information.

Current Status

ThinLTO is currently supported in both the gold plugin as well as in ld64 starting with Xcode 8. Additionally, support is currently being added to the lld linker. The 3.9 release of clang will have ThinLTO accessible using the -flto=thin command line option.

While tuning is still in progress, ThinLTO already performs well compared to LTO, in many cases matching the performance improvement. In a few cases ThinLTO even outperforms full LTO, most likely because the higher scalability of ThinLTO allows using a more aggressive backend optimization pipeline (similar to that of a non-LTO build).

The following results were collected for the C/C++ SPEC cpu2006 benchmarks on an 8-core 2.6GHz Intel Xeon E5-2689. Each benchmark was run in isolation three times and results are shown for the average of the three runs.

Critically, due to the scalable design of ThinLTO, this performance is achieved with a build time that stays within a non-LTO build scale. The following build times were collected on a 20 core 2.8GHz Intel Xeon CPU E5-2680 v2, running Linux and using the gold linker. The results are for an end-to-end build of clang (ninja clang) from a clean build directory, so it includes all the compile steps and links of intermediate binaries such as llvm-tblgen and clang-tblgen.

Release build shows how ThinLTO build time is very comparable to a non-LTO build. Adding -gline-tables-only adds a very small overhead, and ThinLTO is again similar to the regular non-LTO build. However with full debug information, ThinLTO is still somewhat slower than a non-LTO build due to the additional overhead during importing. Ongoing improvements to debug metadata representation and handling are expected to continue to reduce this overhead. In all cases, full LTO is actually significantly slower.

On the memory consumption side, the improvements are significant. Over the last two years, FullLTO was significantly improved, as shown on the chart below, but our measurement shows that ThinLTO keeps a large advantage.

Usage Information

To utilize ThinLTO, simply add the -flto=thin option to compile and link. E.g.

% clang -flto=thin -O2 file1.c file2.c -c
% clang -flto=thin -O2 file1.o file2.o -o a.out

As mentioned earlier, by default the linkers will launch the ThinLTO backend threads in parallel, passing the resulting native object files back to the linker for the final native link. As such, the usage model the same as non- LTO. Similar to regular LTO, for Linux this requires using the gold linker configured with plugins enabled or ld64 starting with Xcode 8.

Distributed Build Support

To take advantage of a distributed build system, the parallel ThinLTO backends can each be launched as a separate process. To support this, the gold plugin provides a thinlto_index_only option that causes the link to exit after creating the combined index and performing global analysis.

Additionally, in this mode:

Instead of using a monolithic combined index, a separate individual index file is written per backend containing the necessary portions of the combined index for recording the imports and any other global summary based optimization decisions that should be acted on in the backend.
A plain text listing of the bitcode files each module will import from is optionally emitted to aid in distributed build file staging (thinlto-emit-imports-files plugin option).

The backends can be launched by invoking clang on the bitcode and providing its index via an option. Finally, the resulting native objects are linked to generate the final binary. For example:

% clang -flto=thin -O2 file1.c file2.c -c
% clang -flto=thin -O2 file1.o file2.o -Wl,-plugin-opt,-thinlto-index-only
% clang -O2 -o file1.native.o -x ir file1.o -c -fthinlto-index=./file1.o.thinlto.bc
% clang -O2 -o file2.native.o -x ir file2.o -c -fthinlto-index=./file2.o.thinlto.bc
% clang file1.native.o file2.native.o -o a.out

Incremental ThinLTO Support

With full LTO, only the initial compile steps can be performed incrementally. If any input has changed, the expensive serial IPA/IPO step must be redone.

With ThinLTO, the serial Thin Link step must be redone if any input has changed, however, as noted earlier this is small and fast, and does not involve loading any module. And any particular ThinLTO backend must be redone iff:

The corresponding (primary) moduleâ€™s bitcode changed
The list of imports into or exports from the module changed
The bitcode for any module being imported from has changed
Any global analysis result affecting either the primary module or anything it imports has changed.

For single machine builds, where the threads are launched by the linker, incremental builds can be achieved by caching the module after applying the global summary based optimizations such as importing, using a hash of the information listed above as the key. This caching is already supported in libLTOâ€™s ThinLTO handling, which is used by ld64. To enable it, the link step needs to be passed an extra flag: -Wl,-cache_path_lto,/path/to/cache

For distributed builds, the above information in items 2-4 are all serialized into the individual index files. So the build system can compare the contents of the input bitcode files (the primary moduleâ€™s bitcode and any it imports from) along with the combined index against those from an earlier build to decide if a particular ThinLTO backend must be redone. To make this process more efficient, the content of the bitcode file is hashed when emitted during the compile phase, and the result is stored in the bitcode file itself so that the cache can be queried during the Thin Link step without reading the IR.

The chart below illustrates the full build time of clang in three different situations:

The full link following a clean build.
The developer fixes the implementation of DenseMap::grow(). This is a widely used header in the project, which forces to rebuild a large number of files.
The developer fixes the implementation of visitCallInst() in InstCombineCalls.cpp. This an implementation file and incremental build should be fast.

These results illustrate how full LTO is not friendly with incremental build, and show how ThinLTO is providing an incremental link-time very close to a non-LTO build.

LLVM Weekly - #129, Jun 20th 2016

Mon, 20 Jun 2016 04:23:00 +0000

Welcome to the one hundred and twenty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Last week was WWDC, which featured talks on what's new in LLVM (slides) and what's new in Swift (slides). Note that the embedded video player suggests you need Safari or the WWDC app to stream the video, but you can find a downloadable version under the "resources" tab.

On the mailing lists

Jason Henline has announced the LLVM parallel-libs subproject which will "host the development of libraries which are aimed at enabling parallelism in code and which are also closely tied to compiler technology.Examples of libraries suitable for hosting within the parallel-libs subproject are runtime libraries and parallel math libraries. The initial candidates for inclusion in this subproject are StreamExecutor and libomptarget which would live in the streamexecutorand libomptarget subdirectories of parallel-libs, respectively."
One of the most active threads this week was about whether the release following 3.9 should be 4.0. Much of the discussion was around whether the move from 3.9 to 4.0 should come with a large change breaking IR compatibility. Chris Lattner suggests a sliding window of IR compatibility may be better.
TB Schardl has posted an RFC on upstreaming the CSI framework ("Comprehensive Static Instrumentation"). The code is now up for review. This framework makes it easy to implement dynamic analysis tools, often without needing compiler changes.
Ashutosh Nema has shared an RFC on strided memory access vectorisation.
In response to a question on the mailing list, Hubert Tong has given a brain dump on the status of work on concepts support in Clang including opportunities for getting involved.
PaweÅ‚ Bylica has asked for advice on dealing with LLVM as a project dependency. In particular, is it worth investigating CMake's ExternalProject module? Chris Bieneman has shared some advice.
Michael Kuperstein has posted an RFC on allowing the loop vectorizer to choose vector widths that generate illegal types. The feedback appears to be possible so far.

LLVM commits

FileCheck learnt the --check-prefixes option as a shorthand for multiple --check-prefix options. r272670.
A local_unnamed_addr attribute was introduced. This can be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. r272709.
The ScalarReplAggregates pass has been removed as it has been superseded by SROA by a long time. r272737.
LLVM's C API gained support for string attributes. r272811.
Assembly parsing and lexing has seem some cleanups. r273007.

Clang commits

A new loop distribution pragma was added. Loop distribution is a transformation which attempts to break a loop in to multiple loops with each taking part of the loop body. r272656.
The nodebug attribute can now be applied to local variables. r272859.
The validity check for MIPS CPU/ABI pairings is now performed at initialisation time and a much clearer message is printed. r272645.

Other project commits

A complete implementation of the C++ Filesystem TS has been checked in. r273034.
LLD's ARM port gained initial support for Thumb with ARMv7a. r272881.

Using LNT to Track Performance

Wed, 15 Jun 2016 23:19:00 +0000

In the past year, LNT has grown a number of new features that makes performance tracking and understanding the root causes of performance deltas a lot easier. In this post, Iâ€™m showing how weâ€™re using these features.

LNT contains 2 big pieces of functionality:

A server,
a. to which you can submit correctness and performance measurement data, by sending it a json-file in the correct format,
b. that analyzes which performance changes are significant and which ones aren't,
c. that has a webui to show results and analyses in a number of different ways.
A command line tool to run tests and benchmarks, such as LLVMâ€™s test-suite, SPEC2000 and SPEC2006 benchmarks.

This post focuses on using the server. None of the features Iâ€™ll show are LLVM-specific, or even specific to ahead-of-time code generators, so you should be able to use LNT in the same way for all your code performance tracking needs. At the end, Iâ€™ll give pointers to the documentation needed to setup an LNT server and how to construct the json file format with benchmarking and profiling data to be submitted to the server.
The features highlighted focus on tracking the performance of code, not on other aspects LNT can track and analyze.
We have 2 main uses cases in tracking performance:

Post-commit detection of performance regressions and improvements.
Pre-commit analysis of the impact of a patch on performance.

I'll focus on the post-commit detection use case.

Post-commit performance tracking

Step 1. Get an overview of the "Daily Report" page

Assuming your server runs at http://yourlntserver:8000, this page is located at http://yourlntserver:8000/db_default/v4/nts/daily_report

The page gives a summary of the significant changes it found today.

An example of the kind of view you can get on that page is the following

In the above screenshot, you can see that there were performance differences on 3 different programs, bigfib, fasta and ffbench. The improvement on ffbench only shows up on a machine named â€œmachine3â€, whereas the performance regression on the other 2 programs shows up on multiple machines.

The table shows how performance evolved over the past 7 days, one column for each day. The sparkline on the right shows graphically how performance has evolved over those days. When the program was run multiple times to get multiple sample points, these show as separate dots that are vertically aligned (because they happened on the same date). The background color in the sparkline represents a hash of the program binary. If the color is the same on multiple days, the binaries were identical on those days.

Letâ€™s look first at the ffbench program. The background color in the sparkline is the same for the last 2 days, so the binary for this program didnâ€™t change in those 2 days. Conclusion: the reported performance variation of -8.23% is caused by noise on the machine, not due to a change in code. The vertically spread out dots also indicate that this program has been noisy consistently over the past 7 days.

Letâ€™s now look at the bigfib. The background color in the sparkline has changed since its previous run, so letâ€™s investigate further. By clicking on one of the machine names in the table, we go to a chart showing the long-term evolution of the performance of this program on that machine.

Step 2. The long-term performance evolution chart

This view shows how performance has evolved for this program since we started measuring it. When you click on one of the dots, which each represent a single execution of the program, you get a pop-up with information such as revision, date at which this was run etc.

When you click on the number after â€œRun:â€ in that pop-up, itâ€™ll bring you to the run page.

Step 3. The Run page

The run page gives an overview of a full â€œRunâ€ on a given machine. Exactly what a Run contains depends a bit on how you organize the data, but typically it consists of many programs being run a few times on 1 machine, representing the quality of the code generated by a specific revision of the compiler on one machine, for one optimization level.

This run page shows a lot of information, including performance changes seen since the previous run:

When hovering with the mouse over entries, a â€œProfileâ€ button will show, that when clicked, shows profiles of both the previous run and the current run.

Step 4. The Profile page

At the top, the page gives you an overview of differences of recorded performance events between the current and previous run.

After selecting which function you want to compare, this page shows you the annotated assembly:

While itâ€™s clear that there are differences between the disassembly, itâ€™s often much easier to understand the differences by reconstructing the control flow graph to get a per-basic-block view of differences. By clicking on the â€œView:â€ drop-down box and selecting the assembly language you see, you can get a CFG view. I find showing absolute values rather than relative values helps to understand performance differences better, so I also chose â€œAbsolute numbersâ€ in the drop down box on the far right:

There is obviously a single hot basic block, and there are differences in instructions in the 2 versions. The number in the red side-bar shows that the number of cycles spent in this basic block has increased from 431M to 716M. In just a few clicks, I managed to drill down to the key codegen change that caused the performance difference!

We combine the above workflow with the llvmbisect tool available at http://llvm.org/viewvc/llvm-project/zorg/trunk/llvmbisect/ to also quickly find the commit introducing the performance difference. We find that using both the above LNT workflow and the llvmbisect tool are vital to be able to act quickly on performance deltas.

Pointers on setting up your own LNT server for tracking performance

Setting up an LNT server is as simple as running the half a dozen commands documented at http://lnt.llvm.org/quickstart.html under "Installation" and "Viewing Results". The "Running tests" section is specific to LLVM tests, the rest is generic to performance tracking of general software.

The documentation for the json file format to submit results to the LNT server is here: http://lnt.llvm.org/importing_data.html.

The documentation for how to also add profile information, is at http://lnt.llvm.org/profiles.html.

LLVM Weekly - #128, June 13th 2016

Mon, 13 Jun 2016 05:08:00 +0000

Welcome to the one hundred and twenty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LDC, a compiler for the D programming language with an LLVM backends has a [email protected]">major release with 1.0.0. The big news with this release is that the frontend is now completely written in D. Congratulations to everyone involved in this release. See the D website for more information about the D programming language.

The minor release LLVM 3.8.1-rc1 has been tagged.

On the mailing lists

Gor Nishanov has shared an RFC on adding support for coroutines to LLVM IR.
Hans Wennborg has shared the release plan for the 3.9 release. This plan would see the release branch created on the 18th of July and the final shipping on the 22nd of August. Hans wonders whether, as with 2.9 and 1.9 the base version number will be incremented (making release 4.0). He'd also like to make LLVM's current release cadence "official" on the website and publicly list people who are currently committed to testing releases.
Chris Bieneman is proposing some changes to the LLVM directory structure. Specifically, adding a 'runtimes' subdirectory and removing the 'projects' subdirectory. Chris has helpfully summarised some of the key feedback and proposed how to move forwards.
Sebastian Pop is trying to get a feel for how many LLVM developers are attending HPCA/PPoPP/CGO in February 2017 in order to estimate potential numbers for an LLVM gathering.
In answer to a question, Krzysztof Parzyszek explains how to use glue values.
Philip Reames provides a handy explanation of what is means for allocations to "escape", be "captured" or be "thread local".
Simon Brand is seeking feedback on how to enhance LLDB to better support HSA applications.
Simon Cook describes how he has set up pseudo-registers representing register locations on a very register-poor architecture.
Vikram TV has shared a proposal on adding a pass that calculates loop cost based on cache data. The prototype patch analyses references to determine which would be in the same cache line. This knowledge can then be used to calculate a more accurate loop cost and. A drawback of the current implementation is it uses a static cache line size.
Sean Silva has kicked off a thread about the intended behaviour of the CGSCC pass manager. This manages passes over strongly connected components of the callgraph.

LLVM commits

Some of the work from the GSoC project on interprocedural register allocation has started to land. A RegUsageInfoCollector analysis was added that collects the list of clobbered registers for a MachineFunction. A new transformation pass was committed which scans the body of a function to find calls and updates the register mask with the one saved by RegUsageInfoCollector. r272403, r272414.
Chapter 2 of the tutorial on building a JIT with ORC has been fleshed out with a rough draft of the text. r271885.
The host CPU detection code for x86 has seen a large refactoring. r271921.
More documentation has been added about LLVM's CodeView support. r272057.
llvm-symbolizer will now be searched for in the same directory as the LLVM or Clang tool being executed. This increases the chance of being able to print pretty backtraces for systems where LLVM tools aren't installed in the $PATH. r272232.

Clang commits

Clang analyzer gained a checker for correct usage of the MPI API in C and C++. r271907.
Documentation was added on avoiding static initializers when using profiling. r272067, r272214.

Other project commits

A hardened allocator, 'scudo' was added to compiler-rt. It attempts to mitigate some common heap-based vulnerabilities. r271968.
Initial support for ARM has landed in LLD. This is just enough to link a hello world on ARM Linux. r271993.
Initial support for AddressSanitizer on Win64 was added. r271915.

LLVM Weekly - #127, June 6th 2016

Mon, 06 Jun 2016 08:02:00 +0000

Welcome to the one hundred and twenty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Graham Markall at Embecosm has been comparing the code size of RISC-V binaries produced by the GCC and LLVM ports, as well as compared to ARM. GCC is currently ahead, though it is worth noting the LLVM port has seen much less attention.

Matthias Reisinger is a Google Summer of Code student working on enabling polyhedral optimisations for the Julia programming language. He's written a blog post detailing his initial steps and immediate future plans. Hopefully we'll see more posts over the summer.

LoÃ¯c Hamot has been working on a [email protected]">C++ to D converter, implemented using Clang.

The MSVC team have blogged about the latest release of Clang with Microsoft CodeGen, based on Clang 3.8.

There is going to be a clang-tidy code dojo in Warsaw on Tuesday the 7th of June.

On the mailing lists

Renato Golin has kicked off a discussion on moving LLVM's repository hosting to GitHub. Chris Lattner came out in favour, specifically motivated by GitHub's community aspects. Renato has very helpfully summarised the discussion.
Peter Smith has shared his initial port of LLD to ARM along with his planned roadmap. The currently submitted patch is just enough to link a hello world executable on ARM Linux.
The apt repo hosted at llvm.org has been temporarily turned off as it is resulting in excessive I/O and network activity. Some commenters ask about setting up an Ubuntu PPA or using the OpenSUSE build service.
David Blaikie has written a GDB pretty printer script for some common LLVM types and described how to use it.
Michael LeMay has posted an RFC on using segmentation to harden SafeStack.
Rui Ueyama has been investigating using sendfile to copy file contents in LLD and shares his results. He concludes the performance improvement is too modest to be worth the change.
If you're interested in register allocation, then delve in to this thread on LLVM's PBQP allocator and copy propagation.
Daniel Dunbar is suggesting some changes to the lit default output.
Steven Wu shares a follow-up RFC on embedded bitcode.
There's a useful discussion in this thread on lowering loops to use a hardware loop instruction.
Peter Collingbourne proposes renaming and slightly redesigning the bitset metadata.

LLVM commits

LLVM gained support for 'SJLJ' (setjmp/longjmp) exception handling on x86 targets. r271244.
LLVM now requires CMake 3.4.3 to build r271325.
Support was added for attaching metadata to global variables. r271348.
The AArch64 backend switched to use SubtargetFeatures rather than testing for specific CPUs. r271555.

Clang commits

The release notes have been updated to explain the current level of OpenMP support (full support of non-offloading features of OpenMP 4.5). r271263.
Clang's source-based code coverage has been documented. r271454.

Other project commits

An -fno-exceptions libc++abi library variant was defined, to match the -fno-exceptions libc++ build. r271267.
LLDB's compact unwind printing tool gained support for ARMv7's compact unwind format. r271744.

LLVM Weekly - #126, May 30th 2016

Mon, 30 May 2016 12:07:00 +0000

Welcome to the one hundred and twenty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

I've been moving house this weekend, so do accept my apologies if you find this issue to be a little less thorough than usual.

News and articles from around the web

Pyston, the LLVM-based Python compiler has released version 0.5. The main changes are a switch to reference counting and NumPy compatibility.

I don't want to become "C++ weekly", but I think this audience appreciates a fun use of C++ features. Verdigris is a header-only library that allows you to use Qt5 without the moc preprocessor.

The call for papers for the 3rd workshop on the LLVM compiler infrastructure in HPC has been published. The deadline for paper submission is September 1st. The workshop will take place on November 14th in Salt Lake City, and is held in conjunction with SC16.

On the mailing lists

Vivek Pandya, a GSoC student working on interprocedural register allocation has shared a weekly status report.
Rafael EspÃndola has proposed creating a bitcode symbol table.
There's been some updates on the progress of open-sourcing PGI's Fortran frontend.
Elena Lepilkina has proposed some enhancement to FileCheck. Some questions were raised about how useful the proposed extensions will be. Sergey Yakoushkin provided more background on how these features are used in a commercial codebase. As Elena notes, these features don't need to all be upstreamed at once (or at all), and are mostly independent.
Lang Hames has posted a heads-up about upcoming breaking API changes for ORC and MCJIT.
Sean Silva has kicked off a discussion on the state of IRPGO. You might ask what is IRPGO? This is profile-guided optimisation performed through instrumentation at the LLVM IR level, as opposed to FEPGO where instrumentation is added by the frontend (e.g. Clang), prior to lowering to IR. Sean would like to make IRPGO the default on all platforms other than Apple at the moment (who may require a longer deprecation period). A number of followup comments discuss possibilities for ensuring all platforms can move forward together, and ensuring a sensible flag exists to choose between frontend or middle-end PGO.
What exactly is a register pressure set? Both Quentin Colombet and Andrew Trick have answers for us.

LLVM commits

New optimisations covering checked arithmetic were added. r271152, r271153.
Advanced unrolling analysis is now enabled by default. r270478.
The initial version of a new chapter to the 'Kaleidoscope' tutorial has been committed. This describes how to build a JIT using ORC. r270487, r271054.
LLVM's stack colouring analysis data flow analysis has been rewritten in order to increase the number of stack variables that can be overlapped. r270559.
Parts of EfficiencySanitizer are starting to land, notably instrumentation for its working set tool. r270640.
SelectionDAG learned how to expand multiplication for larger integer types where there isn't a standard runtime call to handle it. r270720.
LLVM will now report more accurate loop locations in optimisation remarks by reading the starting location from llvm.loop metadata. r270771.
Symbolic expressions are now supported in assembly directives, matching the behaviour of the GNU assembler. r271102.
Symbols used by plugins can now be auto-exported on Windows, which improves support for plugins in Windows. See the commit message for a full description. r270839.

Clang commits

Software floating point for Sparc has been exposed in Clang through -msoft-float. r270538.
Clang now supports the -finline-functions argument to enable inlining separately from the standard -O flags. r270609.

Other project commits

SectionPiece in LLD is now 8-bytes smaller on 64-bit platforms. This improves the time to link Clang with debug info by 2%. r270717.
LLD has replaced a use of binary search with a hash table lookup, resulting in a 4% speedup when linking Clang with debug info. r270999.
LLDB now supports AArch64 compact unwind tables, as used on iOS, tvos and watchos. r270658.

LLVM Weekly - #125, May 23rd 2016

Mon, 23 May 2016 03:57:00 +0000

Welcome to the one hundred and twenty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Stephen Kelly has written a blog post about using Clang through the cindex API to automatically generate Python bindings. He also makes use of SIP.

Krister Walfridsson has written a wonderfully clear post on C's type-based aliasing rules.

This week I discovered the Swift Weekly Brief newsletter. Its author, Jesse Squires does a wonderful job of summarising mailing list traffic, recent commits, and discussions on swift-evolution proposals. If you have an interest in Swift development or language design in general I highly recommend it.

Are you interested in writing for the LLVM blog? Or volunteering to help recruit content authors? If so, get in touch with Tanya.

The next Cambridge LLVM Social will be held at 7.30pm on May 25th at the Cambridge Blue.

On the mailing lists

Elena Demikhovsky is interested in extending scalar evolution (SCEV) analysis to include floating point support. This kicked off a pretty interesting discussion. Sanjoy Das highlighted what he sees as the most important issues to discuss. A number of follow-ups discussed whether enough code uses floating point values as an induction variable to be worth optimising. There was also the question of should vectorisation be pursued at any cost?. Even if a loop can be made vectorisable through loop-versioning with run-time checks, is it worth the code size? Is the cost of maintaining the compiler code worthwhile? Hideki Saito posted a useful summary of the discussion so far.
Chandler Carruth is looking for feedback on the idea of supporting horizontal operations on vector types such as sum directly in LLVM IR. Everyone who has responsed so far is in favour.
Jia Chen, GSoC student with LLVM, has noted the CFL-AA pass seems to be mostly working now and would appreciate reports from people trying it out on their codebases.So far, Geoff Berry reports no correctness issues but seemingly very limited changes in the generated code for SPEC and the LLVM test-suite.
Adam Nemet is seeking feedback on the idea of adding optimisation remarks to indicate where non-temporal stores may be profitable.
Quentin Colombet has summarised recent discussion on policies to help release management and detailed the automatic hooks he hopes to explore next for updating bugs when referenced in a commit message. The following discussion looked at how these hooks may be implemented and what level of rigidity would be most beneficial to the community.
Dean Michael Berris is looking for a way of defining a default implementation for a pseudo-instruction. No answers yet, but hopefully that will change soon!
Galina Kistanova is doing some cleanup work on zorg (the buildbot-based testing infrastructure of the LLVM project) and is interested whether anyone uses these seemingly stale modules.

LLVM commits

llc will now report all errors in the input file rather than just exiting after the first. r269655.
The SPARC backend gained support for soft floating point. r269892.
Reloc::Default no longer exists. Instead, Optional<Reloc> is used. r269988.
An initial implementation of a "guard widening" pass has been committed. This will combine multiple guards to reduce the number of checks at runtime. r269997.

Clang commits

clang-include-fixer gained a basic Vim integration. r269927.
The intrinsics headers now have feature guards enabled in Microsoft mode to combat the compile-time regression discussed last week due to their increased size. r269675.
avxintrin.h gained many new Doxygen comments. r269718.

Other project commits

lld now lets you specify a subset of passes to run in LTO. r269605.
LLDB has replaced uses of its own Mutex class with std::mutex. r269877, r270024.

LLVM Weekly - #124, May 16th 2016

Mon, 16 May 2016 04:26:00 +0000

Welcome to the one hundred and twenty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The main news this week is the announcement of Scala-native, an ahead-of-time compiler for Scala using LLVM. Jos Dirkens has written a getting started guide if you want to compile it and try it out. There's also more information in the slides from the announcement talk.

On the mailing lists

More of the students taking part in Google Summer of Code with LLVM-related projects have introduced themselves and their plans. Vivek Pandya will be working on interprocedural register allocation. Scott Egerton will be working on capture tracking improvements. Jie Chen will be working on better alias analysis, specifically improving cfl-aa. Matthias Reisinger will be working on enabling polyhedral optimisations in Julia, and Zhengyan Liu has plans for SAFECode memory hardening.
Renato Golin kicked off a discussion about whether LLVM's release process could be better aligned with downstream users. This thread covered a broad range of topics and triggered a lot of discussion, but luckily there's no need to summarise it as Renato has done the job for us.
Nicolai HÃ¤hnle notes that currently libLLVM.so contains about 1.7MB in its .data.rel.ro section, of which about 1.3MB comes from the MCInstrDesc tables created by tablegen representing a massive number of pointers to be relocated. He suggests reducing this by using offsets instead. Reducing the relocations will both reduce binary size and increase the portion of the binary that can be mapped as shared. So far, responses to the thread are supportive of the idea.
James Knight has written a detailed post on how it's not really possible to write an LL/SC loop guaranteed to make forward progress in LLVM IR right now. There are restrictions on what you can do between a load-linked and a store-conditional instruction that the code generator may not meet.
A public llvm-foundation mailing list has been announced, which to facilitate discussions related to the Foundation.
As well as the long, technically detailed and precise threads each week it's nice to highlight cases where a simple question has a simple answer. How do you register a pass as being opt-in based on a command-line flag? Answer: have it run every time, but return immediately if the desired command line flag isn't present.
Sanjoy Das has shared an RFC on adding a callee-saved register verifier. As is clarified later in the thread, the intention is to ensure that code not generated by LLVM (e.g. output from another JIT or hand-written assembly) properly adheres to the calling convention and doesn't clobber registers it shouldn't. The proposed pass would simply add code to check that the test values written to the callee-saved registers aren't modified.
In response to questions about pass ordering, Mehdi Amini has written a helpful description of what exactly happens when you do opt -mymodulepass0 -myfunctionpass -mymodulepass1.
Konstantin Vladimirov wonders if there's an option to force the register allocator to use as many architectural registers as possible to reduce dependencies. The short answer is there isn't currently, but it would be interesting to investigate.
Diana Picus has shared an RFC on modifying llc so it no longer exits after the first error. Generally people are in favour, and the patch should hopefully land soon (it had to be temporarily backed out after exposing some test cases failures in lldb).
Nico Weber has noted that now with AVX512, Clang's intrinsics headers are huge. This can cause compile time issues, for instance Nico reports building all of the v8 JS engine is 6% faster after removing the avx512 includes. The thread participants haven't yet decided on the best way forward to fix this, beyond the potential immediate step of adding include guards so AVX512 intrinsic headers aren't included when not compiling for AVX512 platforms.

LLVM commits

The outdated guide on cross-compiling LLVM has been brought up to date. r269054.
The WebAssembly backend gained preliminary fast instruction selection (fast-isel) support. r269083, r269203, r269273.
Loop unrolling (other than in the case of explicit pragmas) is now disabled at -Os in LLVM. You may recall last week it was enabled for -Os in Clang, but with different thresholds. r269124.
A new cost-tracking system has been implemented for the loop unroller. r269388.
LLVM's Sparc backend has seen the addition of more LEON-specific features, e.g. signed and unsigned multiply-accumulate. r268908.
llc's -run-pass option will now work with any pass known to the pass registry. Previously it would silently do nothing if you specify indirectly added analysis passes or passes not present in the optimisation pipeline. r269003.
WebAssembly register stackification and coloring are now run very late in the optimisation pipeline. The commit message suggests it's useful to think of these passes as domain-specific liveness-based compression rather than a conventional optimisation. r269012.
When declaring global in textual LLVM IR, you must now assign them with e.g. @0 = global i32 42. r269096.
The internal assembler is now enabled by default for 32-bit MIPS targets. r269560.

Clang commits

Clang now supports __float128. r268898.
Clang gained a new warning that triggers when casting away calling conventions from a function. r269116.
The recently developed include-fixer tools now has documentation. r269167.

Other project commits

compiler-rt's CMake build system can now build builtins without a full toolchain, allowing you to bootstrap a cross-compiler. r268977.
LLD will now sort relocations to optimise dynamic linker performance. r269066.

LLVM Weekly - #123, May 9th 2016

Mon, 09 May 2016 01:28:00 +0000

Welcome to the one hundred and twenty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

If you're in London tomorrow you may be interested in the NMI Open Source Conference. You can register until midday today. I'll be giving a brief talk on lowRISC. While on the subject of conferences, if you are interested in diversity and inclusion in computing education, you may want to check out the CAS #include diversity conference in Manchester on the 11th June.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Fabien Giesen has written a brief article explaining why compilers exploit undefined signed overflow.

The Google Open Source blog has a short piece on the XRay function call tracing system that was proposed for upstreaming last week on the LLVM mailing list.

On the mailing lists

By far the most active thread on the mailing list this week was the resumption of discussion on adding an LLVM Code of Conduct. The draft text can be found here. As well as a number of messages offering a "+1" to the current text, concerns were raised by some about the implications of "violations of this code outside these spaces may affect a person's ability to participate within them", and about how the committee enforcing the CoC will be selected.
Amos Robinson wrote to the mailing list with an optimisation missed by LLVM's current Global Value Numbering pass. Rather excitingly, Daniel Berlin reports he's working on a new GVN implementation.
Chandler Carruth has written an update on the state of work to move to the new pass manager. He notes the major missing piece at the moment is the ability to communicate invalidation information between two parts of the pass manager.
Jonas Hahnfield has shared an RFC on automatically generating non-temporal loads and stores. Some respondents are very strongly against this, suggesting it's something better left for the programmer to specify.
Some of the students taking part in Google Summer of Code this year with LLVM-related projects have been introducing themselves on the mailing list. Utpal Bora will be working on implementing Polly as an analysis pass in LLVM. Bianca-Cristina Cristescu will be working in enabling LLVM's self-hosted modules builds using libstdc++, and Roman Gareev will be improving the vectorisation process in Polly.
Chris Bieneman notes he recently introduced a new option in LLVM's CMake buildsystem that may be of particular interest to package maintainers. LLVM_DISTRIBUTION_COMPONENTS allows you to specify which components of LLVM you want to install.
Peter Collingbourne has posted an RFC on extending ThinLTO to allow a bitcode module to embed another bitcode module containing summary information for CFI and whole-program devirtualisation.
Adam Nemet is interested in feedback on the idea of filtering optimisation remarks by the hotness of the code region.
Justin Bogner has given a heads-up to out-of-tree backend maintainers that he intends to change the API of SelectionDAGISel::Select so the function directly replaces nodes rather than returning the desired replacement.
Quentin Colombet has shared an RFC on how LLVM contributors can better help release management. There's a lot of support for this direction, with most comments discussing ways of better tagging commit messages (post-commit in phabricator/bugzilla, or through getting committers to write commit messages in a certain format).

LLVM commits

LLVM's CppBackend has been removed. As the commit message says, this backend has bit-rotted to the extent that it's not useful for its original purpose and doesn't generate code that compiles. r268631.
The AVR backend has seen a large amount of code merged in to LLVM. r268722.
The MIPS backend has seen some large changes to how relocations are handled. These are now represented using MipsMCExpr instead of MCSymbolRefExpr. As someone who has done quite a lot of (out-of-tree) LLVM backend work, I've always found it odd how some architectures have globally visible enum members in include/llvm/MC/MCExpr.h. r268379.
LLVM builds should hopefully now be deterministic by default, as LLVM_ENABLE_TIMESTAMPS is now opt-in rather than opt-out. In fact, a follow-up patch removed the option altogether. r268441, r268670.
The AARch64 backend learned to combine adjustments to the stack pointer for callee-save stack memory and local stack memory. r268746.

Clang commits

Clang now supports -malign-double for x86. This matches the default behaviour on x86-64, where i64 and f64 types are aligned to 8-bytes instead of 4. r268473.
Loop unrolling is no longer completely disabled for -Os. r268509.
Clang's release notes (reflecting the state of current trunk) have been updated to say more about the state of C++1z support. r268663.

Other project commits

libcxx will now build a libc++experimental.a static library to hold symbols from the experimental C++ Technical Specifications (e.g. filesystem). This library provides no ABI compatibility. r268443, r268456.
All usage of pthreads in libcxx has been refactored in to the __threading_support header, with the intention of making it easier to retarget libcxx to platform that don't support pthreads. r268374.
libcxx gained support for the polymorphic memory resources C++ TS. r268829.

LLVM Weekly - #122, May 2nd 2016

Mon, 02 May 2016 08:31:00 +0000

Welcome to the one hundred and twenty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

GCC 6.1 has been released. Perhaps the most apparent user-visible change is that the C++ frontend now defaults to C++14.

The Rust compiler has introduced a new intermediate representation, MIR, used for optimisations prior to lowering to LLVM IR.

Tanya Lattner has written about the LLVM Foundation's plans for 2016. The LLVM Foundation has established 3 main programs: Educational Outreach, Grants and Scholarships, and Women in Compilers and Tools.

On the mailing lists

Dean Michael Berris has shared an RFC on upstraming Google's 'XRay' function call tracing system. For more information, you can read the XRay whitepaper.
Sanjoy Das has suggested generalising the AssumptionCache to AxiomAche. He proposes maintaining separate lists of guards and assumptions within the AxiomCache.
There's been some more activity in response to Phil Tomson's question about instruction scheduling. Christof Douma followed up with some advice.
Chris Bieneman has suggested raising the CMake minimum version to 3.4.3. Renato Golin flagged up some concerns, with Chandler Carruth offering a counterpoint.
Discussion has continued on the proposal to introduce a new LLVM sub-project for parallelism runtime and support libraries. This is probably best summarised by reading Hal Finkel's thoughts on the way forward, C BergstrÃ¶m's concerns, and Jason Henline's proposed charter for the subproject.
Peter Collingbourne has shared an RFC on redesigning the LLD symbol table in order to improve memory locality.

LLVM commits

LLVM now supports indirect call promotion based on value-profile information. This will promote indirect calls to a direct call guarded by a precondition. r267815.
The LLVM documentation has been extended with a CMake primer covering the basics of the CMake scripting language. r268096.
The PDB dumper has been refactored into a library. r267431.
The MinLatency attributed has been removed from SchedMachineModel. r267502.
CodeGenPrepare will now use branch weight metadata to decide if a select should be turned into a branch. r267572.
Support for llvm.loop.distribute.enable metadata was added. This indicates a loop should be split in to multiple loops. r267672.
The SystemZ backend now supports the Swift calling convention. r267823.
libFuzzer's documentation has been expanded and improved. r267892.

Clang commits

clang-tidy gained a new checker for redundant expressions on both sides of a binary operator. r267574.
A new clang-tidy check will warn for use of functions like atoi and atol that don't report conversion errors. r268100.
The nodebug attribute on a global or static variable will now suppress all debug info for that variable. r267746.
A number of OpenMP features gained codegen support, such as the map clause and target data directive. r267808, r267811.

Other project commits

LLD now supports an -O0 option to produce output as quickly as possible. Currently this disables section merging at the cost of a potentially much larger output. r268056.
The symbol table in LLD's ELF linker has been redesigned with the intent of improving memory locality. The new design produces measurable speedups for the binaries tested in the commit message. r268178.
LLD's linkerscript support expanded to encompass comparison operators. r267832.
LLD performance on large executables has been improved by skipping scanRelocs on sections that are never mapped to memory at runtime (e.g. debug sections). r267917.

LLVM Foundation 2016 Announcements

Wed, 27 Apr 2016 08:48:00 +0000

With 2016 upon us, the LLVM Foundation would like to announce our plans for the year. If you are not familiar with the LLVM Foundation, we are a 501(c)(3) nonprofit that supports the LLVM Project and its community. We are best known for our LLVM Developersâ€™ Meetings, but we are introducing several new programs this year.

The LLVM Foundation originally grew out of the need to have a legal entity to plan and support the annual LLVM Developersâ€™ Meeting and LLVM infrastructure. However, as the Foundation was created we saw a need for help in other areas related to the LLVM project, compilers, and tools. The LLVM Foundation has established 3 main programs: Educational Outreach, Grants & Scholarships, and Women in Compilers & Tools.

Educational Outreach

The LLVM Foundation plans to expand its educational materials and events related to the LLVM Project and compiler technology and tools.

First, the LLVM Foundation is excited to announce the 2016 Bay Area LLVM Developersâ€™ Meeting will be held November 3-4 in San Jose, CA. This year will be the 10th anniversary of the developer meeting which brings together developers of LLVM, Clang, and related projects. For this yearâ€™s meeting, we are increasing our registration cap to 400 in order to allow more community members to attend.

We also are investigating how we can support or be involved in other conferences in the field of compilers and tools. This may include things such as LLVM workshops or tutorials by sponsoring presenters, or providing instructional materials. We plan to work with other conference organizers to determine how the LLVM Foundation can be helpful and develop a plan going forward.

However, we want to do more for the community and have brainstormed some ideas for the coming year. We plan to create some instructional videos for those just beginning with LLVM. These will be short 5-10 minute videos that introduce developers to the project and get them started. Documentation is always important, but we find that many are turning to videos as a way to learn.

Grants & Scholarships

We are creating a grants and scholarships program to cover student presenter travel expenses to the LLVM Developersâ€™ Meetings. However, we also hope to expand this program to include student presenter travel to other conferences where the student is presenting their LLVM related work. Details on this program will be published once they have been finalized.

Women in Compilers & Tools

Grace Hopper invented the first compiler and yet women are severely underrepresented in the field of compilers and tools. At the 2015 Bay Area LLVM Developersâ€™ Meeting, we held a BoF on this topic and brainstormed ideas about what can be done. One idea was to increase LLVM awareness at technical conferences that have strong female participation. One such conference is the Grace Hopper Conference (GHC). The LLVM Foundation has submitted a proposal to present about LLVM and how to get involved with the LLVM open source community. We hope our submission is accepted, but if not, we are exploring other ways we can increase our visibility at GHC. Many of the other ideas from this BoF are being considered and actionable plans are in progress.

In addition, to these 3 programs, we will continue to support the LLVM Projectâ€™s infrastructure. The llvm.org server will move to a new machine to increase performance and reliability.

We hope that you are excited about the work the LLVM Foundation will be doing in 2016. Our 2016 Plans & Budget may be viewed here. You may also contact our COO & President, Tanya Lattner ([email protected]">[email protected]) or the LLVM Foundation Board of Directors ([email protected]">[email protected]).

LLVM Weekly - #121, Apr 25th 2016

Mon, 25 Apr 2016 04:20:00 +0000

Welcome to the one hundred and twenty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Congratulations to the eight students who have been selected for LLVM projects on Google Summer of Code this year. There's about a month before they start coding. The time between now and then is the 'community bonding period', so please do make them feel welcome.

The preliminary release schedule for LLVM/Clang 3.8.1 has been published. This would have a deadline of May 25th for requesting changes to be merged and would see the final release on June 15th.

On the mailing lists

A series of RFCs have been posted around the idea of adding an EfficiencySanitizer tool. Like the existing sanitizer tools, this would rely on compiler-based dynamic instrumentation in order to detect problems in user code. The goal is to collect userful information with overhead less than 5x, ideally closer to 3x. Separate threads have been started to discuss an EfficiencySanitizer cache fragmentation tool and a working set tool.
Sanjoy Patel has proposed removing the llvm.expect intrinsic, in favour of using metadata to represent the same information. There isn't currently full agreement on this.
Richard Trieu is seeking feedback on which floating point to boolean conversions should trigger warnings.
How do you add fixup information to a MachineInstruction? Tim Northover has the answer.
Elena Lepilkina has shared an RFC on adding support for custom metrics and test parameterisation to LNT. The feedback so far seems positive.
Phil Tomson is looking for advice on instruction scheduling in LLVM. As he notes in his email, it's part of LLVM that's seen a lot of changes over the past 8 years or so. I'm certainly interested in the answer here.

LLVM commits

An implementation of optimisation bisection support has landed. This helps to track down bugs by allowing optimisations to be selectively disabled at compile-time to identify the one introducing a miscompile. r267022.
The AArch64 and ARM thread pointer intrinsics have been merged to make a target-independent llvm.thread.pointer intrinsic. r266818.
The llvm.load.relative intrinsic has been added. r267233.
There have been more changes to DebugInfo which will require a bitcode upgrade. A script to perform this upgrade is linked in the commit message. r27296.
The ORC JIT API improved its support for RPC, including support for calling functions with return values. r266581.
The patchable-function function attribution has been introduced, indicating that the function should be easily patchable at runtime. r266715.
The IntrReadArgMem intrinsic property has been split in to IntrReadMem and IntrArgMemOnly. r267021.
The MachineCombiner gained the ability to combine AArch64 fmul and fadd in to an fmadd. r267328.
Scheduling itineraries were added for Sparc, specifically for the LEON processors. r267121.

Clang commits

A prototype of an include fixing tool was created. The indexer remains to be written. r266870.
A new warning has been added, which will trigger if the compiler tries to make an implicit instantiation of a template but cannot find the template definition. r266719.
Initial driver flags for EfficiencySanitizer were added. r267059.

Other project commits

The initial EfficiencySanitizer base runtime library was added to compiler-rt. It doesn't do much of anything yet. r267060.
LLD learned to support the linkerscript ALIGN command. r267145.
LLDB can now parse EABI attributes for an ELF input. r267291.

LLVM Weekly - #120, Apr 18th 2016

Mon, 18 Apr 2016 06:05:00 +0000

Welcome to the one hundred and twentieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

This week has seen not one, but two articles about LLVM and profile-guided optimisation. Dig in John Engelen's article about optimising D's virtual function calls with PGO, then read Geoffroy Couprie's article about PGO with Rust.

The next Cambridge (UK) social will be at 7.30pm on April 20th, at the Cambridge Blue.

Alex Denisov has written a blog post around the idea of building a mutation testing system using LLVM.

On the mailing lists

James Knight is requesting a way to test changes before committing them. Renato Golin had a thorough response.
Eric Fiselier has shared an RFC on packaging the proposed libc++ filesystem library.
Teresa Johnson has shared an RFC on the ThinLTO distributed backend interface.
Jeroen Dobbelaere is wondering if there's any interest in an LLVM social in Leuven, Belgium.
Mingwha Wang asks if there's any support for outlining in LLVM. You'll want to look at CodeExtractor.

LLVM commits

AtomicExpandPass learned to lower various atomic operations to __atomic_* library calls. The eventual aim is to move all atomic lowering from Clang to LLVM. r266115.
Targets can now define an inlining threshold multiplier, to e.g. increase the likelihood of inlining on platforms where calls are very expensive. r266405.
The ownership between DICompileUnit and DISubprogram has been reversed. This may break tests for your out-of-tree backend, but the commit has a link to a Python script to update your testcases. r266446.
llvm-readobj learned to print a histogram of an input ELF file's .gnu.hash . r265967.
More target-specific support for the Swift calling convention (on ARM, AARch64, and X86) has landed. Also, a callee save register is used for the swiftself parameter. r265997, r266251.
A new allocsize attribute has been introduced. This indicates the given function is an allocation function. r266032.
analyzeSiblingValues has been replaced with a new lower-complexity implementation in order to reduce compile times. r266162.
The AMDGPU backend gained a skeleton GlobalISel implementation. r266356.
Every use of getGlobalContext other than the C API has been removed. r266379.

Clang commits

Clang gained support for the GCC ifunc attribute. r265917.
The __unaligned type qualifier was implemented for MSVC compatibility. r266415.
Support for C++ core guideline Type.6: always initialize a member variable was completed in clang-tidy. r266191.
A new clang-tidy checker for suspicious sizeof expressions was added. r266451.

Other project commits

The way relocations are applied in the new ELF linker has been reworked. r266158.
ELF LLD now supports parallel codegen for LTO using splitCodeGen. r266484.
Support for Linux on SystemZ in LLDB landed. r266308.

LLVM Weekly - #119, Apr 11th 2016

Mon, 11 Apr 2016 06:03:00 +0000

Welcome to the one hundred and nineteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Last week the slides from the recent EuroLLVM 2016 Developers' Meeting made it online. This week this has been followed by videos of the talks from the conference.

John Regehr has written about efficient integer overflow checking in LLVM, looking at cases where LLVM can and cannot remove unnecessary overflow checks, and how this might be improved.

Version 0.13 of Pocl, the portable OpenCL implementation has been released. This release works with LLVM/Clang 3.8 and 3.7, and adds initial OpenCL 2.0 support and improved HSA support.

Serge Guelton at QuarksLab has written up a really useful guide to implementing a custom directive handler in Clang.

Microsoft's Visual C++ team are looking for feedback on Clang/C2 (Clang with Microsoft CodeGen).

On the mailing lists

James Molloy has posted an RFC on adding support for constant folding calls to math.h functions on long doubles. Currently these functions aren't constant-folded as the internal APFloat class doesn't implement them and long double operations aren't portable. Solutions include adding support to APFloat, linking against libMPFR to provide compile-time evaluation, or recognising when the long double format of the host and target are the same, so the host math library can be called. From the responses so far, there seems to be some push-back on adding the libMPFR dependency.
Sanjoy Das has an RFC on adding a patchable-prologue attribute. This would be used to indicate that the function's prologue is compiled so as to provide support for easy hot-patching.
Ulrich Weigand has shared a patch for supporting LLDB on Linux on SystemZ. The patchset contains many big-endian fixes, and may be of interest to others looking at porting LLDB.

LLVM commits

The Swift calling convention as well as support for the 'swifterror' argument has been added. r265433, r265480.
Work on GlobalISel continues with many commits related to the assignment of virtual registers to register banks. r265445, r265440.
LLVM will no longer perform inter-procedural optimisation over functions that can be "de-refined". r265762.
The substitutions supported by lit are now documented. r265314.
Unrolled loops now execute the remainder in an epilogue rather than the prologue. This should produce slightly improved code. r265388.

Clang commits

Clang gained necessary support for the Swift calling convention. r265324.
New flags -fno-jump-tables and -fjump-tables can be used to disable/enable support for jump tables when lowering switch statements. r265425.
TargetOptions is now passed through all the TargetInfo constructors. This will allow target information to be modified based on the ABI selected. r265640.
A large number of intrinsics from emmintrin.h now have Doxygen docs. r265844.

Other project commits

clang-tidy gained a new check to flag initializers of globals that access extern objects, leading to potential order-of-initialization issues. r265774.
LLD's ELF linker gained new options --start-lib, --end-lib, --no-gnu-unique, --strip-debug. r265710, r265717, r265722.

LLVM Weekly - #118, Apr 4th 2016

Mon, 04 Apr 2016 04:22:00 +0000

Welcome to the one hundred and eighteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Almost all slides from the recent EuroLLVM conference are now available online for your enjoyment.

Some readers my be interested in a new paper about the 'LifeJacket' tool for verifying precise floating-point optimisations in LLVM.

Christian NeumÃ¼ller has written a new tool for syntax highlighting and cross-referencing C and C++ source using libclang.

On the mailing lists

Chandler Carruth suggests that just like commits that break codegen are immediately reverted, commits that introduce large, especially super-linear compile time regressions should be reverted. There's a lot of agreement in the general principle in replies, though some point out that much of the slowdown across the past few LLVM and Clang versions is due to a large number of small changes.
James Molloy is interested in discussing how LLVM could learn the size of a particular std::vector and omit unnecessary checks etc.
Nick Johnson has a couple of questions about IfConversion in LLVM. They haven't been answered yet, but I know I'd be interested in the answer.
Russell Wallace has kicked off a very useful thread about generating calls to existing functions from the JIT.
Zachary Turner is interested in people's feelings on requiring a minimum of MSVC 2015 to compile LLVM and Clang. The general feeling so far is that it's too early for this, as typically the policy is to support the last two major MSVC releases.
Hans Wennborg has kindly highlighted a recent API change to TargetFrameLowering::eliminateCallFramePseudoInstr that will be of interest to maintainers of out-of-tree backends.
Matt Masten has posted an RFC on vectorizing loops with calls to math functions using SVML (Intel's short vector math library).
Eric Christopher has posted an RFC on migrating debug type information generation from the backends to the frontend.
Ke Bai's memory scope proposal hasn't really seen any responses up to now. Philip Reames does share some feedback, but notes it's unlikely this proposal could realistically be merged in to LLVM unless there is more interest. If this is an area that interests you, then please do have a good read of Ke's proposal.

LLVM commits

The Lanai backend has landed. r264578.
A new llvm.experimental.guard intrinsic has been added. As described in the accompanying documentation, along with deoptimization operand bundles this allows frontends to express guards or checks on optimistic assumptions made during compilation. r264976.
Support for a number of new Altivec instructions has been added. Amazingly, this includes BCD (Binary Coded Decimal) instructions. r264568.
The concept of MachineFunctionProperties has been introduced, with the first property being AllVRegsAllocated. This allows passes to declare that they require a particular property, in this case requiring that they be run after regalloc. r264593.
On X86, push will now be used in preference to mov at all optimisation levels (before it was only enabled for -Os). r264966.
LLVM's support library can now compute SHA1 hashes. This is used to implement a 'build-id'. r265094, r265095.
When metadata is only referenced in a single function, it will now be emitted just in that function block. The aim of this is to improve the potential of lazy-loading. r265226.

Clang commits

The Lanai backend is now supported in the Clang driver. r264655.
libTooling gained a handy formatAndApplyAllReplacements function. r264745.

Other project commits

Parts of LLD are starting to use the new Error handling. r264910, r264921, r264924, and more.
Infrastructure was added to LLD for generating thunks (as required on platforms like MIPS when calling PIC code from non-PIC). r265059.

My Little LLVM: Undefined Behavior is Magic!

Fri, 01 Apr 2016 00:00:00 +0000

Thereâ€™s been lots of discussion online (and then quite some more) about compilers abusing undefined behavior. As a response the LLVM compiler infrastructure is rebranding and adopting a motto to make undefined behavior friendlier and less prone to corruption.

The re-branding puts to rest a long-standing issue with LLVMâ€™s â€œdragonâ€ logo actually being a wyvern with an upside-down head, a special form of undefined behavior in its own right. The logo is now clearly a pegasus pony.

Another great side-effect of this rebranding is increased security by auto-magically closing all vulnerabilities used by the hacker who goes by the pseudonym â€œPinkie Pieâ€.

These new features are enabled with the -rainbow clang option, in honor of Rainbow Dashâ€™s unary name.

A Few Examples

C++â€™s memory model specifies that data races are undefined behavior. It is well established that no sane compiler would optimize atomics, LLVM will therefore supplement the Standardâ€™s happens-before relationship with an LLVM-specific happens-to-work relationship. On most architectures this will be implemented with micro-pause primitives such as x86â€™s rep rep rep nop instruction.

Shifts by bit-width or larger will now return a normally-distributed random number. This also obsoletes rand() and std::random_shuffle.

bool now obeys the rules of truthiness to avoid that annoying â€œbut what if itâ€™s not zero or one?â€ interview question. Further, incrementing a bool with ++ now does the right thing.

Atomic integer arithmetic is already specified to be twoâ€™s complement. Regular arithmetic will therefore now also be atomic. Except when volatile, but not when volatile atomic.

NaNs will now compare equal, subnormals are free to self-classify as normal / zero / other, negative zero simply wonâ€™t be a thing, IEEE-754 has been upgraded to PONY-754, floats will still round with style, and generating a signaling NaN is now guaranteed to not be quiet by being equivalent to putchar('\a'). While weâ€™re at it none of math.h will set errno anymore. This has nothing to do with undefined behavior but seriously, errno?

Type-punning isnâ€™t a thing anymore. Weâ€™re renaming it to type-pony-ing, but it doesnâ€™t do anything surprising besides throw parties. AND WHO DOESNâ€™T LIKE PARTIESâ€½ EVEN SECURITY PEOPLE DO! ðŸŽ‰

A Word From Our Sponsors

The sanitizersâ€”especially undefined behavior sanitizer, address sanitizer and thread sanitizerâ€”are great tools when dealing with undefined behavior. Use them on your tests, combine them with fuzzers, try them as cupcake topping! Be warned: their runtimes arenâ€™t designed to be secure and you shouldnâ€™t ship them in production code!

Cutie Marks

To address the horse in the room: weâ€™ve left the new LLVM logoâ€™s cutie mark as implementation-defined. Different instances of the logo can use their own cutie mark to illustrate their proclivities, but must clearly document them.

Posted by JF Bastien and Michael Spencer.

LLVM Weekly - #117, Mar 28th 2016

Mon, 28 Mar 2016 06:22:00 +0000

Welcome to the one hundred and seventeenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Google Summer of Code applications are now closed. Applicants and interested third-parties can look forward to finding out which projects were selected on April 22nd.

Ramkumar Ramachandra has written a blog post giving a whirlwind tour of the internals of LLVM's fast register allocator (FastRegAlloc.cpp).

Alex Denisov has blogged about the various test suites used within the LLVM project.

Version 1.13 of the TTA-based Co-design Environment (TCE) has been released. This adds support for LLVM 3.8.

On the mailing lists

Last week, Jia Chen's thread about pointer analysis in LLVM had yet to receive replies. It's now received some extensive discussion. Daniel Berlin argues there is lower hanging fruit than improving AA. It does seem there's interest in getting in getting cfl-aa turned on by default, which will require some careful bug fixing.
The issue of LLD and fatal errors has again surfaced on the mailing list. The more productive line of discussion focused around what should be expected of LLD when given maliciously corrupted inputs. Rui Ueyama is suggesting adding a verifier pass which could be optionally enabled or disabled.
Andrew Kaylor has shared an RFC on adding new support to help triage optimisation-related failures. Optimisation passes are assigned numbers which can be used to help bisect a failure. Michael Gottesman reports a similar approach used in Swift.
Samuel F Antao has summarised recent discussion on unified offloading support in Clang.
Duncan P. N. Exon Smith has proposed an RFC on lazy-loading of debug info metadata.
What does it mean for a platform to support a type but not support a particular operation? Krzysztof Parzyszek is kind enough to provide a clear and straight-forward answer.
Applications for Google Summer of Code have closed, but the list of new project ideas from Philip Reames is a good starting point for anybody looking for a way to get stuck in to making impactful contributions to LLVM.

LLVM commits

A new utility, update_test_checks.py was added to update opt or llc test cases with new FileCheck patterns. r264357.
Non-power-of-2 loop unroll count pragmas are now supported. r264407.
The NVPTX backend gained a new address space inference pass. r263916.
Instances of Error are now convertible to std::error_code. Conversions are also available between Expected<T> and ErrorOr<T>. r264221, r264238.
Hexagon gained supported for run-time stack overflow checking. r264328.

Clang commits

Clang now supports lambda capture of *this by value. r263921.
The bitreverse builtins are now documented. r264203.

Other project commits

LLDB will fix inputted expressions with 'trivial' mistakes automatically. r264379.
ThreadSanitizer debugging support was added to LLDB. r264162.
Polly gained documentation to describe how it fits in to the LLVM pass pipeline. r264446.
LLDB has been updated to handle the UTF-16 APIs on Windows. r264074.

LLVM Weekly - #116, Mar 21st 2016

Mon, 21 Mar 2016 05:06:00 +0000

Welcome to the one hundred and sixteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

If you're a student and would like to get paid to work on an LLVM-related project over the summer then do consider applying for Google Summer of Code with LLVM. More details about Summer of Code are available here. The deadline for applications is this Friday, March 25th at 1900 GMT. I'd also encourage you to look at lowRISC's project ideas if you have an interest in open source hardware.

Stephen Kelly has written about his new Clang-based tool for porting a C++ codebase to use almost-always-auto. As was pointed out on Twitter, Ryan Stortz from Trail of Bits has a tools that removes auto and does roughly the opposite.

Honza HubiÄka has written up his experiments of building LibreOffice with GCC6 and LTO. This includes a comparison to a build using LLVM and Clang.

Nick Clifton has shared an update for February and March on the GNU toolchain that may be of interest.

The developer of the Capstone disassembly framework and the Unicorn multi-architecture simulator is running a funding campaign for the Keystone multi-architecture assembler framework. Like Capstone, this will build on LLVM but also aims to go beyond it.

On the mailing lists

Ehsan Amiri has shared an RFC on a change in the InstCombine canonical form. In the ensuing discussion, the question of the current state of the typeless pointer work was raised and answered.
Sean Silva has shared some recent performance observations about LLD with --build-id. Adding support for this option has added a measurable slowdown which should be considered when reviewing comparisons with other linkers from before it was added.
I normally prefer to link to mailing list threads where there has already been some discussion or attempts at answers, but I think this one is worth some more eyeballs. Jia Chen is interested in the tradeoffs in LLVM using more sophisticated pointer analyses. There are no responses at the time of writing, but it seems an interesting question.
Huw Davies has proposed a new IR attribute, incoming-stack-align. This is needed for Wine which may require functions to have an ABI stack alignment different to the host's alignment.

LLVM commits

A new Error support class has been added to support structured error handling. See the associated updates to the LLVM programmer's manual for more info. r263609.
New documentation was committed for advanced CMake build configurations. r263834.
Support was added for MIPS32R6 compact branches. r263444.
The MemCpyOptimizer will now attempt to reorder instructions in order to create an optimisable sequence. r263503.
llvm-readobj learnt to print sections and relocations in the GNU style. r263561.

Clang commits

Attributes have been added for the preserve_mostcc and preserve_allcc calling conventions. r263647.
clang-format will handle some cases of automatic semicolon insertion in JavaScript. r263470.
Clang learned to convert some Objective-C message sends to runtime calls. r263607.

Other project commits

AddressSanitizer is now supported on mips/mips64 Android. r263261.
The documentation on the LLD linker has added a few numbers to give an idea of the sort of inputs it needs to handle. e.g. Chrome with debug info contains roughly 13M relocations, 6.3M symbols, 1.8M sections and 17k files. r263466.

LLVM Weekly - #115, Mar 14th 2016

Mon, 14 Mar 2016 04:56:00 +0000

Welcome to the one hundred and fifteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

We have an LLVM-related research position currently being advertised here at the University of Cambridge Computer Lab. If you'd like an informal chat about what it's like working in this group or on this project please don't hesitate to get in touch with me.

News and articles from around the web

LLVM and Clang 3.8 have now been released. Check out the LLVM and Clang release notes for a run-down of the new features.

It's GDC this week and if you're attending you may be interested that there's an LLVM meetup scheduled for Thursday.

Felix Angell has a detailed blog post introducing generating LLVM IR from Go.

On the mailing lists

Jason Henline has posted a very detailed RFC on creating a new parallel runtime library. StreamExecutor wraps both the CUDA and OpenCL runtimes and is used internally at Google.
Ed Maste has shared an update on linking the FreeBSD base system using LLD. With a few workarounds, the full amd64 FreeBSD system is now buildable.
Vedant Kumar has shared an RFC on removing redundant profile counter updates.
Sean Silva is seeking to formalize the 'revert for more design review policy'. Overall, the comments seem to be positive.
With EuroLLVM coming up this week, people have been advertising their birds of a feather sessions. e.g. the LLVM on PowerPC and SystemZ session and the session on surviving downstream.
Rafael EspÃndola reports that compilation with LLVM and Clang has been getting slower over time. Hal Finkel has some good input on potential areas for improvement.

LLVM commits

Loop invariant code motion learnt the ability the exploit the fact a memory location is known to be thread-local. r263072.
A new llvm.experimental.deoptimize intrinsic has been added. r26328.
A ThinLTOCodeGenerator was added in order to provide a proof-of-concept implementation. r262977.
The Sparc backend gained support for co-processor condition branching and conditional traps. r263044.

Clang commits

Clang gained support for the [[nodiscard]] attribute. r262872.
New AST matchers were added for addrLabelExpr, atomicExpr, binaryCondtionalOperator, designatedINitExpr, designatedInitExpr, designatorCountIs, hasSyntacticForm, implicitValueINitExpr, labelDecl, opaqueValueExpr, parenListExpr, predefinedExpr, requiresZeroInitialization, and stmtExpr. r263027.

Other project commits

Error and warning messages in LLD are now more consistent. r263125.
Documentation on the new ELF and COFF LLD linkers has been updated. r263336.

LLVM Weekly - #114, Mar 7th 2016

Mon, 07 Mar 2016 04:40:00 +0000

Welcome to the one hundred and fourteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLVM has been accepted as a mentoring organisation in Google Summer of Code 2016. See here for more about what that means. If you're a student who would like to get paid to work on LLVM over the summer, you should definitely consider applying. Also take a look at the full list of organisations in GSoC 2016. If you have an interest in open source hardware, in my (biased) opinion you should definitely look at lowRISC's listed project ideas.

LLVM and Clang 3.8 'final' has been tagged. A release should be imminent.

There was a big C++ committee meeting last week. You can find summaries here and here. If you were hoping for modules, concepts, UFCS, ranges, or coroutines in C++17 I'm afraid you're in for disappointment. Many new features will be available in C++ Technical Specifications though.

llvmlite 0.9.0 has been released. llvmlite is a light-weight Python binding for LLVM. If you're wondering how to get started with llvmlite, then check out this recent blog post from Ian Bertolacci on writing fibonacci in LLVM with llvmlite.

Andi McClure has written a really interesting blog post about writing software without a compiler. In this case, generating LLVM IR from LuaJIT.

On the mailing lists

John McCall has posted an RFC on implementing the Swift calling convention in LLVM and Clang. Feedback is generally positive and there's an interesting discussion on the handling of Swift's errors in the calling convention.
Roel Jordans has posted some thoughts and questions in preparation for the EuroLLVM birds of a feather sessions on compilers in education.
Peter Collingbourne has shared a new RFC on adding a new ABI for virtual calls, termed the 'relative ABI'. He'd also like to change how virtual calls are represented in the IR.
Grigori Fursin from Dividiti shared some recent work on crowdtuning compiler optimisation heuristics.
What is the current status of garbage collection with statepoints in LLVM?. Philip Reames and Sanjoy Das have the answer!
Xinmin Tian has shared a proposal for function vectorisation and loop vectorisation with function calls.
Akira Hatanaka is interested in comments to his RFC on more precise lifetime.end metadata. In the given example, three local variables have non-overlapping lifetimes and could potentially use the same stack slot, but this isn't currently done.

LLVM commits

MemorySSA has gained an initial update API. r262362.
TableGen can now check at compile time that a scheduling model is complete. r262384.
New comments in PassBuilder give a description of what trade-offs are expected for each optimisation level. r262196.
LoopLoadElimination is now enabled by default. r262250.
A new patch adding infrastructure for profile-guided optimisation enhancements in the inline has landed. r262636.
Experimental ValueTracking code which tried to infer more precise known bits using implied dominating conditions has been removed. Experiments didn't find it to be profitable enough, but it may still be useful to people wanting to experiment out of tree. r262646.

Clang commits

Clang's C API gained an option to demote fatal errors to non-fatal errors. This is likely to be useful for clients like IDEs. r262318.
clang-cl gained initial support for precompiled headers. r262420.
An -fembed-bitcode driver option has been introduced. r262282.
Semantic analysis for the swiftcall calling convention has landed. r262587.
Clang's TargetInfo will now store an actual DataLayout instance rather than a string. r262737.

Other project commits

LLDB can now read line tables from Microsoft's PDB debug info files. r262528.
The LLVM test-suite gained the ability to hash generated binaries and to skip tests if the hash didn't change since a previous run. r262307.
LLVM's OpenMP runtime now supports the new OpenMP 4.5 doacross loop nest and taskloop features. r262532, r262535.

LLVM Weekly - #113, Feb 29th 2016

Mon, 29 Feb 2016 06:58:00 +0000

Welcome to the one hundred and thirteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

News and articles from around the web

LLVM and Clang 3.8RC3 has been tagged.

EuroLLVM 2016 is less than a month away. If you want to attend, be sure to register.

The Red Hat blog has a summary of new features in the upcoming GCC 6 release.

The Meeting C++ blog has a helpful summary of a subset of the proposals for the next C++ committee meeting.

On the mailing lists

Chandler Carruth has suggested moving the LLVM test-suite repository to Github. In response to some concerns, Chris Lattner points out that using GitHub in this case doesn't mean abandoning the current development workflow, it just means it can be augmented with GitHub-style pull requests for those who prefer it. Chandler summarised the thread and provided a list of next steps.
Sanjoy Das pointed out a potential soundness issue with the available_externally linkage type. This triggered a very long discussion. James Knight pointed out the same issue could happen with normal functions in a shared library. There was some back and forth between Hal Finkel and Chandler Carruth on the best approach to addressing this problem.
Philip Reames asks whether a PHI depending on another PHI in the same basic block is valid. It's currently accepted by the verifier but arguably shouldn't be. So far, nobody has argued that it should be valid.
Matthias Braun kicked off a discussion on better defining the semantics of reserved and unallocatable registers. After more discussion, he followed up with a revised definition.
David Li has posted a proposal for supporting in-process merging of profile data.

LLVM commits

The Sparc backend now contains definitions for all registers and instructions defined in the Sparc v8 manual. r262133.
LLVM gained a basic LoopPassManager, though it currently only contains dummy passes. r261831.
A number of TargetInstrInfo predicates now take a reference to a MachineInstr rather than a pointer. r261605.
The WebAssembly backend gained redzone support for the userspace stack. r261662.

Clang commits

Whole-program vtable optimisation is now available in Clang using the -fwhole-program-vtables flag. r261767.
Clang gained __builtin_canonicalize which returns the platform-specific canonical encoding of a floating point number. r262122.
A hasAnyName matcher was added. r261574.
The pointer arithmetic checker has been improved to report fewer false positives. r261632.

Other project commits

The new ELF linker gained support for identical code folding (ICF). This reduces the size of an LLD binary by 3.6% and of a Clang binary by 2.7%. As described in the commit message, this is not a "safe" version of ICF as implemented in GNU gold, so will cause issues if the input relies on two distinct functions always having distinct addresses. r261912.
Polly's tree now contains an update_check.py script that may be useful to other LLVM devs. It updates a FileCheck-based lit test by updating the CHECK: lines with the actual output of the RUN: command. r261899.
LLDB gained a new set of plugins to help debug Java programs, specifically Java code JIT-ed by the Android runtime. r262015.
The new OpenMP 4.5 affinity API is now supported in LLVM's openmp implementation. r261915.
The new ELF linker gained support for the -r command-line option, which produces relocatable output (partial linking). r261838.
The CMake/lit runner for SPEC in the LLVM test-suite can now run the C CPU2006 floating point benchmarks (but not the Fortran ones). r261816.
The old ELF linker has been deleted from LLD. r262158.

LLVM Weekly - #112, Feb 22nd 2016

Mon, 22 Feb 2016 03:29:00 +0000

Welcome to the one hundred and twelfth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Filip Pizlo has written a fantastic article introducing the new B3 JIT compiler for WebKit's JavaScriptCore. This intends to replace LLVM as the optimising backend to their fourth-tier JIT. The article describes in detail their reasons for moving away from LLVM (mainly compile-time) and the design trade-offs made, such as in reducing memory allocations and minimising pointer-chasing in the IR. This reminds me of the trade-offs Mike Pall made in the LuaJIT 2.0 IR. Philip Reames also shared some initial thoughts on B3. I know some people have expressed disappointment about WebKit moving away from LLVM, but if you'll allow me to insert just a little bit of editorial I'd argue B3 is a very positive development for LLVM and the wider compiler community. B3 explores a different set of design trade-offs to those chosen for LLVM, and these sort of changes are probably easiest to explore in a fresh codebase.Thanks to this write-up (and hopefully future B3/AIR documentation), we can learn from the B3 developers' experiences and consider if some of their choices will make sense for LLVM. It's also good to remember that LLVM isn't the only feasible route for code generation and optimisation, and we shouldn't treat LLVM's design choices as the one-true way to do things. Impressively, B3 was developed to its current state in only 6 months of developer-time.

Version 0.17.0 of LDC, the LLVM-based compiler for the D programming language has [email protected]">been released. You can view a detailed changelog here.

GCC6 will feature a whole bunch of new warnings, and this blog post details many of them.

The schedule for EuroLLVM 2016 has now been posted. This will be held March 17th-18th in Barcelona.

On the mailing lists

Bob Wilson proposes that format-security warnings in Clang default to error. Nico Weber posted a handy summary of the thread.
Sanjoy Das has posted an RFC on adding guard intrinsics to LLVM. These would be used in a similar way to the Check opcode in WebKit's new B3 compiler.
Alina Sbirlea proposes adding bitcode tests to the LLVM test-suite. Hal Finkel suggests going further and just pulling Halide in to the LLVM test-suite as a front-end example that should provide greater test coverage.
Andrew Trick shared some thoughts on LLVM in light of the WebKit B3 announcement. "Even when LLVM's compile time problems are largely solved, and I believe they can be, there will always be systemic compile time and memory overhead from design decisions that achieve generality, flexibility, and layering. These are software engineering tradeoffs."

LLVM commits

The PPCLoopDataPrefetch pass has been moved to Transforms/Scalar/LoopDataPrefetch in preparation for it becoming a target-agnostic pass. r261265.
The cmpxchg LLVM instruction now allows pointer type operands. r261281.
The X86 backend gained support for a new stack symbol ordering optimisation. This is primarily intended to reduce code size, and produces small but measurable improvements across some SPEC CPU 2000 benchmarks. r260917.
The LLVM C API has been extended to allow it to be used to manipulate the datalayout. r260936.
Some major work on the LazyCallGraph has been checked in. r261040.
The AMDGPU backend gained a basic disassembler. r261185.
The PostOrderFuctionAttrs pass has been ported to the new pass manager. As described in the commit message, this actually represents a major milestone. r261203.
The Hexagon backend gained support for thread-local storage. r261218.

Clang commits

A nullPointerConstant AST matcher was added. r261008.
Clang gained a -Wcomma warning, which will warn for most uses of the builtin comma operator. r261278

Other project commits

LLD has sprouted a release notes document. r260960.
The LLVM test-suite's CMake build system saw a number of fixes for SPEC. r261470.

LLVM Weekly - #111, Feb 15th 2016

Mon, 15 Feb 2016 03:58:00 +0000

Welcome to the one hundred and eleventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

There has been a new release of the CilkPlus compiler. This includes an update to the latest LLVM and Clang trunk. CilkPlus implements the Cilk Plus language extensions for data and task parallelism in Clang.

There's been some more papers appearing from the C++ standards committee. P0225R0, or as you may prefer to call it "Why I want Concepts, and why I want them sooner rather than later" is worth a read. There's also been a few other recently published papers on iterator facades, the filesystem technical specification, and unified function call syntax.

On the mailing lists

Jacques Pienaar has proposed upstreaming the 'Lanai' backend. This is for a CPU design used internally at Google, and the posting of these patches did attract some attention in the press. A good chunk of the ensuing discussion focused on what the bar should be for accepting a new backend upstream. There seem to ultimately be far more people for the upstreaming than against it, but some concern was raised about the ability for others to test the generated code without access to hardware or even simulators.
Natanael Ramos recently worked with LLVM for his bachelor thesis, and as a result wrote and submitted a tutorial for writing a new LLVM register allocator. This can also be found on github.
Nolan has been working on an experimental 6502 backend, and sought some help with a memory operand folding problem. He later followed up to the list with his solution, and David Chisnall added some extra thoughts on potential approaches to targeting 6502 or similar architectures.
Hans Wennborg is looking for help in expanding the release notes for the 3.8 release.
Vaivaswatha Nagaraj has been working on a control structure analysis capable of detecting control structures in the CFG and is seeking feedback on his code.
Lang Hames has followed up to his RFC on error handling in LLVM libraries with a detailed post summarising his thoughts and responding some some feedback.
Sadly, CMake's current Ninja generator is non-deterministic. The good news is there is already a fix in upstream CMake.
Peter Collingbourne prototyped a change to reduce DWARF emitter memory consumption. Early results are very positive.
Philip Reames proposes removing the inaccessiblememonly attribute from the 3.8 branch, on the grounds that the major motivating patch was reverted, there has been no further development, and including it in a release may pose a backwards-compatibility concern. There appears to be agreement so far in the responses.
LLVM will be applying for inclusion in the Google Summer of Code this year. If you have a project listed on the 'open projects' page, please review and update it if necessary, or suggest new projects.

LLVM commits

The WholeProgramDevirt pass has been added. This implements whole program optimization of virtual calls where the list of callees is known to be fixed. r260312.
The AVR backend upstreaming continues with the addition of the AVR tablegen instruction definitions. r260363.
There's been a bunch of other work on the new global instruction selection mechanism this week, but the commits I'd pick out are the addition of support for translating Add instructions and for lowering returns. It is currently being tests with the AArch64 backend. r260549, r260562, r260600.
The AArch64 backend gained support (including a scheduling model) for the Qualcomm Kryo CPU. r260686.
LoopUnrollAnalyzer has been abstracted out from LoopUnrollPass, and gained unit tests for its functionality. r260169.
llvm-config gained preliminary Windows support. r260263.
The details of the convergent attribute have been clarified in the language reference. The convergent attribute will now be removed on functions which provably don't converge or invoke any convergent functions. r260316, r260319.

Clang commits

It is now possible to perform a 3-stage Clang build using CMake. It is suggested in the commit message this may be useful for detecting non-determinism in the compiler by verifying stage2 and stage3 are identical. r260261.
ARMv8.2-A can be targeted using appropriate Clang options. r260533.
Clang's CMake build system learned the CLANG_DEFAULT_CXX_STDLIB to set the default C++ standard library. r260662.

Other project commits

The new LLD ELF linker gained initial link-time optimisation support. r260726.
LLDB has seen some more updates for Python 3 support, though not yet enough for a clean testsuite run. r260721.

LLVM Weekly - #110, Feb 8th 2016

Mon, 08 Feb 2016 08:44:00 +0000

Welcome to the one hundred and tenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Slides from the LLVM devroom at FOSDEM last weekend are now available online. Unfortunately there was an issue with the recording of the talks so videos will not be available.

JavaScriptCore's FTL JIT is moving away from using LLVM as its backend, towards B3 (Bare Bones Backend). This includes its own SSA IR, optimisations, and instruction selection backend.

Source tarballs and binaries are now available for LLVM and Clang 3.8-RC2.

The Zurich LLVM Social is coming up this Thursday, February 11th at 7pm.

Jeremy Bennett has written up a comparison of the Clang and GCC command-line flags. The headline summary is that 397 work in both GCC and LLVM, 433 are LLVM-only and 598 are GCC-only.

vim-llvmcov has been released. It is a vim plugin to show code coverage using the llvm-cov tool.

On the mailing lists

Mehdi Amini has posted an RFC on floating point environment and rounding mode handling in LLVM. The work started all the way back in 2014 and has a whole bunch of patches up for review. Chandler Carruth has responded with a detail description of his concerns about the current design, and his proposed alternative seems to be getting a lot of positive feedback.
Morten Brodersen has recently upgraded a number of applications from the old JIT to the new MCJIT under LLVM 3.7.1 but has found significant performance regressions. Some other respondents have seen similar issues, either in compilation time or in reduced code quality in the generated code. Some of the thread participants will be providing specific examples so they can be investigated. It's possible the issue is something as simple as a different default somewhere. Benoit Belley noted they saw regressions due to their frontend's use of allocas in 3.7.
Lang Hames kicked off a long discussion about error handling in LLVM libraries. Lang has implemented a new scheme and is seeking feedback on it. There's a lot of discussion that unfortunately I haven't had time to summarise properly. If error handling design interests you, do get stuck in.
Adrian McCarthy has written up details on the recent addition of minidump support to LLDB. Minidumps are the Windows equivalent of a core file.
Juan Wajnerman is looking at adding support for multithreading to the Crystal language, and has a question about thread local variables. LLVM won't re-load the thread local address, which causes issues when a thread local variable is read in a coroutine running on one thread which is then suspended and continued on a different thread. This is apparently a known issue, covered by PR19177.
Steven Wu has posted an RFC on embedding bitcode in object files. The intent is to upstream support that already exists in Apple's fork. Understandably some of the respondents asked how this relates to the .llvmbc section that the Thin-LTO work is introducing. Steven indicates it's pretty much the same, but for Mach-O rather than ELF and that he hopes to unify them during the upstreaming.

LLVM commits

LLVM now has a memory SSA form. This isn't yet used by anything in-tree, but should form a very useful basis for a variety of analyses and transformations. This patch has been baking for a long time, first being submitted for initial feedback in April last year. r259595.
A new loop versioning loop-invariant code motion (LICM) pass was introduced. This enables more opportunities for LICM by creating a new version of the loop guarded by runtime checks to test for potential aliases that can't be determined not to exist at compile-time. r259986.
LazyValueInfo gained an intersect operation on lattice values, which can be used to exploit multiple sources of facts at once. The intent is to make greater use of it, but already it is able to remove a half range-check when performing jump-threading. r259461.
The SmallSet and SmallPtrSet templates will now error out if created with a size greater than 32. r259419.
The ability to emit errors from the backend for unsupported features has been refactored, so BPF, WebAssembly, and AMDGPU backends can all share the same implementation. r259498.
A simple pass using LoopVersioning has been added, primarily for testing. The new pass will fully disambiguate all may-aliasing memory accesses no matter how many runtime checks are required. r259610.
The way bitsets are used to encode type information has now been documented. r259619.
You can now use the flag -DLLVM_ENABLE_LTO with CMake to build LLVM with link-time optimisation. r259766.
TableGen's AsmOperandClass gained the IsOptional field. Setting this to 1 means the operand is optional and the AsmParser will not emit an error if the operand isn't present. r259913.
There is now a scheduling model for the Exynos-M1. r259958.

Clang commits

Clang now has builtins for the bitreverse intrinsic. r259671.
The option names for profile-guided optimisations with the cc1 driver have been modified. r259811.

Other project commits

AddressSanitizer now supports iOS. r259451.
The current policy for using the new ELF LLD as a library has been documented. r259606.
Polly's new Sphinx documentation gained a guide on using Polly with Clang. r259767.

LLVM Weekly - #109, Feb 1st 2016

Mon, 01 Feb 2016 06:24:00 +0000

Welcome to the one hundred and ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The GNU Tools Cauldron 2016 has been announced for the 9th-11th of September 2016, in Hebden Bridge, UK.

The Sulong project has been announced. It is an LLVM IR interpreter using the Truffle framework and Graal on the JVM to support JIT compilation.

Ehsan Akhgari has posted an updated on building Firefox with clang-cl. It is now possible to build a complete Firefox with Clang without using the MSVC fallback once.

I've mentioned it down below in the list of notable commits, but it's worth calling out here too: the old autoconf build-system has now been removed from LLVM. 3.8 will be the last release to include it. Time to switch to CMake if you haven't already.

John Regehr gave a talk about undefined behaviour in LLVM at the Paris LLVM meetup, and you can find the slides here.

On the mailing lists

James Knight has written to the list to get feedback on approaches to cleaning up Clang's handling of atomics. There seems to be widespread support for the cleanup. James followed up again to slightly revise his plan.
Matt Arsenault proposes that all libcalls be canonicalized to intrinsics. All responses so far are in favour.
Ke Bai has shared a proposal on representing multiple memory scopes in LLVM IR. There hasn't been any feedback yet.
Dmitree Kuvaiskii asks if anyone has implemented a pass utilizing Intel's new MPX memory protection. The answer appears to be no, and in addition David and Kostya are sceptical about how worthwhile it would be.
Peter Collingbourne has proposed a new optimisation, virtual constant propagation. The original motivation was to reduce the overhead added by enabling control-flow integrity in certain Chromium benchmarks. Constants will be devirtualized at LTO time.

LLVM commits

The autoconf build system for LLVM has been removed. r258861.
The WebAssembly backend gained support for unaligned loads and stores. r258779.
LLVM's MCAsmSreamer will now always use .p2align rather than .align, because .align's behaviour can differ between targets. r258750.
Intrinsic IDs are now looked up by binary search rather than the previous more complex mechanism. This improves the compile time of Function.cpp. r258774.
TargetSelectionDAGInfo has been renamed to SelectionDAGTargetInfo and now lives in CodeGen rather than Target. r258939.
A LoopSimplifyCFG pass was added to canonicalise loops before running through passes such as LoopRotate and LoopUnroll. r259256.

Clang commits

The clang-cl driver will now warn for unknown arguments rather than erroring, to match the behaviour of MSVC. r258720.
The old autoconf build system was removed from Clang. r258862.
The 'sancov' (SanitizerCoverage) tool gained some documentation. r259000.

Other project commits

libcxx gained an implementation of ostream_joiner. r259014, r259015.
lld gained a new error function which won't cause process exit. The hope is this can be used to provide a gradual path towards lld-as-a-library. r259069.
The lit runner for the LLVM test suite can now be passed --param=profile=perf which will cause each test to be run under perf record. r259051.

LLVM Weekly - #108, Jan 25th 2016

Mon, 25 Jan 2016 04:48:00 +0000

Welcome to the one hundred and eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLVM 3.8 RC1 has been released. Now is the time to test it out with your favourite projects and report any issues.

The deadline for the EuroLLVM call for papers is today.

Version 1.6 of the Rust programming language was released las week. Rust uses LLVM for its code generation.

The LLVM Social in Paris will be held this week on Wednesday.

On the mailing lists

Quentin Colombet has posted an RFC asking for views on adding a kind of MachineModulePass. Questions include who would be interested and why. John Criswell's response includes some interesting use-cases.
Eduard-Mihai Burtescu has posted an RFC on making byval argument passing work with opaque pointers.
Quentin Colombet has posted an RFC seeking opinions on the contract between LLVM IR and the backends for instruction selection. In particular, should backends be able to perform instruction selection on any valid LLVM IR. Several people have fed back that in practical terms, a backend doesn't need to be able to select any instruction as long as it provides IR to IR transformations that can perform necessary modifications prior to instruction selection.
Jonas Wagner is looking for feedback on how to support self-modifying branches in LLVM. The thread contained some interesting discussion about the cost of well-predicted branches. It would certainly seem worthwhile to delve deeper to see how much of the overhead is due to limitations in LLVM's basic block at-a-time instruction scheduler.
Philip Reames has written to the list to warn users of the RewriteStatePointsForGC that there are currently issues expressing arbitrary exceptional control flow. The thread discusses some potential solutions to the current issues.
Ed Maste has been working to use libunwind in FreeBSD's base system, and queries its stack usage. LLVM's libunwind allows for 120 saved registers on all architectures, while in contrast the GCC unwinder has a target-dependent maximum (it's 18 on x86-64).

LLVM commits

llvm::SplitModule gained a new flag which can be used to cause it to attempt to split the module without globalizing local objects. r258083.
The WebAssembly backend will now rematerialize constants with multiple uses rather than holding them live in registers, as there is no code size saving in using registers in for constants in most cases in the WebAssembly encoding. r258142.
Some small patches from the global instruction selection effort have started to land, such as the introduction of a generic machine opcode for ADD (G_ADD) and the all-important CMake support for building it. r258333, r258344.
getCacheLineSize was added to TargetTransformInfo. It's currently only used by PPCLoopDataPrefetch. r258419.
LoopIdiomRecognize improved in its ability to recognise memsets. r258620.

Clang commits

A number of new AST matchers were added. r258042, r258072, and more.
The LeakSanitizer documentation has been updated with a usage example. r258476.

Other project commits

The new ELF linker gained initial support for MIPS local GOT (global offset table) entries. r2583888.
The LLVM test suite now contains a ClangAnalyzer subdirectory containing tests for the static analyzer. r258336.

LLVM Weekly - #107, Jan 18th 2016

Mon, 18 Jan 2016 05:32:00 +0000

Welcome to the one hundred and seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

I have a very exciting piece of non-LLVM news to share this week. On Saturday I proposed to my partner Carrie Anne, and I'm delighted to report that she said yes. You may well question if this piece of personal news has any relevance to you, and in response I'd like to highlight just how important Carrie Anne is to this weekly newsletter. For over two years now, I've given up 2-3+ hours of my time every week without fail on evenings and weekends, time we could really be spending together as a couple. Without Carrie Anne's understanding and support LLVM Weekly couldn't exist. 2016 is going to be a very exciting year.

News and articles from around the web

Registration is now open for EuroLLVM 2016. The conference will be held in Barcelona on March 17th-18th. The call for papers closes on January 25th.

Registration is open for the Clang/LLVM development sprint to be held on the weekend of Feb 6th/7th at Bloomberg's London and New York offices.

The next Cambridge LLVM social will be held on Wednesday 20th January at 7.30pm, and will be colocated with the FreeBSD social.

On the mailing lists

Rui Ueyama has run benchmarks for every LLD commit. He observes the linker in general is getting slightly slower over time as it gains more functionality, but that no commit seems to increase link time without a justifiable reason.
Chris Bieneman has posted an RFC on removing autoconf from the trunk. The proposal is to remove it from the repository on January 26th. There don't seem to be any objections to this timeline so far.
Krzystof Parzyszek has shared a description of the recently committed register data-flow framework.
Discussion has been ongoing in the thread about global instruction selection, particularly surrounding the semantics of bitcasts on big-endian systems. James Molloy has posted to clarify the current behaviour.
JF Bastien has posted an RFC on supporting non-temporal fencing in LLVM IR.
Quentin Colombet is seeking feedback on the best way to map LLVM IR values to MachineInstr values as part of his GlobalISel work.
The branch for LLVM 3.8 has now been created. The first release candidate should some soon.
John McCall has posted an RFC on enforcing pointer type alignment in Clang. The RFC proposes the following "It is not undefined behavior to create a pointer that is less aligned than its pointee type. Instead, it is only undefined behavior to access memory through a pointer that is less aligned than its pointee type."
Philip Reames has posted a note for anyone using RewriteStatepointsForGC. There's been a recent change in the handling of vectors of pointers.
Derek Schuff has followed up to the previous discussion about allowing virtual registers after register allocation with some more thoughts after some offline discussions.
Hans de Goede is working on a backend for TGSI (Tungsten Graphics Shader Infrastruture, as used by Gallium) and has a number of questions.

LLVM commits

The ORC JIT API now supports remote JITing over an RPC interface to a separate process. The LLI tool has been updated to use this interface. r257305, r257343.
The Hexagon backend gained a target-independent SSA-based data flow framework for representing data flow between physical registers and passes using this to implement register liveness analysis, dead code elimination, and copy propagation. r257447, r257480, r257485, r257490.
The documentation on committing code reviewed on Phabricator to trunk has been improved. r257764.
WebAssembly gained a prototype instruction encoder and disassembler based on a temporary binary format. r257440.
LLVM's MathExtras gained a SaturatingMultiplyAdd helper. r257352.
llvm-readobj has much-expanded support for dumping CodeView debug info. r257658.
The code that finds code sequences implementing bswap or bitreverse and emits the appropriate intrinsic has been rewritten. r257875.
The AMDGPU backend gained a new machine scheduler for the Southern Islands architecture. r257609.

Clang commits

A Python implementation of scan-build has been added. r257533.
The 'interrupt' attribute is now supported on x86. r257867.
Clang learned to respond to the -fsanitize-stats flag. It can currently only be used with control-flow integrity and allows statistics to be dumped. r257971.

Other project commits

The compiler-rt CMake buildsystem gained experimental support for tvOS and watchOS. r257544.
Initial support was added for PPC and the new ELF linker. r257374.
The CMake and Lit runners in the LLVM test-suite can now support the integer C and C++ tests from SPEC CPU2006. r257370.

LLVM Weekly - #106, Jan 11th 2016

Mon, 11 Jan 2016 05:15:00 +0000

Welcome to the one hundred and sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

Many readers may be interested that last week was the 3rd RISC-V Workshop. You can find slides from the two lowRISC talks here and here. You may also want to read my liveblog of the event.

News and articles from around the web

The BSD Now podcast recently interviewed Alex Rosenberg about his work on LLVM/Clang and FreeBSD.

The folks at QuarksLab have shared a Clang hardening cheat sheet.

LLDB 3.8 will feature initial Go debugging support.

The next Paris LLVM Social will be held on January 27th and includes a talk from John Regehr.

The next Zurich LLVM Social will be taking place on January 14th.

On the mailing lists

One long discussion on the lists this week was regarding the design of the new LLD ELF and COFF linkers. Rui Ueyama notes that they are currently designed as commands instead of libraries. e.g. exit() on failure is considered appropriate. Chandler Carruth argues strongly that supporting a library interface is important for any project under the LLVM umbrella. There ultimately seems to be agreement that the ability to use the linker as a library is important for some use cases, but also that the person doing the work (Rui) should be able to go about development in the way that makes most sense to him. He intends to focus first on reaching feature parity with the GNU linker, and then look at issues such as the library interface.
There's been some discussion in the thread on the new global instruction selection implementation about the semantics of inttoptr and ptrtoint. Specifically whether the currently specified semantics are appropriate for architectures where converting between integers and pointers is not a no-op. It looks like Philip Reames, David Chisnall, and others should share an RFC on this issue in the following weeks.
Matthew Arsenault asks about TargetTransformInfo getOperationCost. Hal Finkel points out the problems Matt is seeing are likely down to there being two cost models. One used for vectorization, and one used for inlining and unrolling.
James Byerly has an interesting question about constraint solving and unspillable register classes. He is targetting a custom, seemingly micro-coded architecture and hasn't received any responses so far.
Steve King has shared some thoughts on adding ISD::OPAQUE to complement ISD::BITCAST.
What is the current status of LLDB on Windows? Zachary Turner has written a summary.
Both Xilinx and Microsoft Research Cambridge are advertising intern positions on LLVM-related projects.

LLVM commits

LLVM gained the -print-funcs option which can be used to filter IR printing to only certain functions. r256952.
The LLVM ADT library gained a new sum type abstraction for pointer-like types and an abstraction for embedding an integer within a pointer-like type. r257282, r257284.
LLVM now recognises the Samsung Exynos M1 core. r256828.
InstCombine learned to expose more constants when comparing getelementptrs (GEPs) by detecting when both GEPs could be expressed as GEPs with the same base pointer. r257064.
SelectionDAGBuilder will set NoUnsignedWrap for an inbounds getelementptr and for load/store offsets. r256890.
AArch64 MachineCombine will now allow fadd and fmul instructions to be reassociated. r257024.
Macro emission in DWARFv4 is now supported. r257060.
llvm-symbolizer gained the -print-source-context-lines option to print source code around the line. r257326.

Clang commits

Clang's CMake build system can now perform a multi-stage bootstrap build with profile-guided optimisation. r256873.
Clang's command line frontend learned to handle a whole bunch of -fno-builtin-* arguments. r256937.
The new ELF LLD linker will now be used for th AMDGPU target. r257175.

Other project commits

The performance of string table construction in the LLD ELF linker has been improved. This improves link time of lld by 12% from 3.50 seconds to 3.08 seconds. r257017.
The LLD ELF linker gained support for the AMDGPU target. r257023.

LLVM Weekly - #105, Jan 4th 2016

Mon, 04 Jan 2016 06:07:00 +0000

Welcome to the one hundred and fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Happy new year! This issue marks the second anniversary of LLVM Weekly. It's rather short as the past week has been very quiet, with most LLVM developers seemingly taking a break over the holidays. My colleague Wei Song and myself will be presenting about lowRISC at the 3rd RISC-V workshop on Wednesday this week. Do say hi if you're going to be there.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Sanjoy Das has written a blog post about issues with LLVM's undef value. Interestingly, he provides an example where undef can actually inhibit optimisations.

On the mailing lists

Devin Coughlin provided a really useful and detailed guide to how you might implement a lifetime checker in the Clang static analyzer.
There's been some questions raised about what happened to the LLVM/Clang 3.7.1 release. Everything is ready to go, it's just waiting for the release manager to push the button.
Rahman Lavaee Mashhadi has been experimenting with disabling function alignment. He observes this results in a segfault on some programs, which David Chisnall points out is because of the C++ ABI using low-bits on pointers.
Dan Liew has posted some feedback on the Arcanist/Phabricator work-flow. This has resulted in a new patch up for review improving the LLVM Phabricator documentation.

LLVM commits

The -align-all-loops and -align-all-functions arguments have been introduced to force function or loop alignments for testing purposes. r256571.
The x86 backend has added intrinsics for reading and writing to the flags register. r256685.

Clang commits

Various Clang classes have been converted to use the TrailingObjects helper. r256658, r256659, and more.
__readeflags and __writeeflags intrinsics are exposed in Clang. r256686.

Other project commits

In libcxx, undefined behaviour in <list> has been fixed for builtin pointer types and support added for the next ABI version. r256652.

LLVM Weekly - #104, Dec 28th 2015

Mon, 28 Dec 2015 07:50:00 +0000

Welcome to the one hundred and fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The schedule for the LLVM devroom at FOSDEM has been published. This will be on January 30th 2016 in Brussels at FOSDEM.

Andy Finnell spent some time over the Christmas vacation porting the LLVM Kaleidoscope tutorial to Erlang and has kindly shared the fruits of his labours.

Richard Pennington has written another blog post about ELLCC, this time about using it to cross-compile the Linux kernel for the Raspberry Pi.

Tim Jones (lecturer at the University of Cambridge Computer Laboratory) has written about the alias analysis used in the HELIX compiler. There's nothing LLVM-specific here, indeed it was implemented using ILDJIT but should be of general interest to compiler developers.

On the mailing lists

Keno Fischer has posted a proposal for multi-location debug info support in LLVM IR. This would allow, for instance, modelling when a variable is available either on the stack or in a register.
Adam Nemet is proposing to extend the PowerPC software prefetching pass to work on other targets, specifically AArch64.
Russel Wallace asks for advice on finding all pointers to functions, and had a number of suggestions in response.
For a while now, Galina Kistanova has been posting statistics from the LLVM buildbots. This includes the number of commits for each project, the number of failed builds, and average build time. I haven't linked to it before, so felt I should rectify that.

LLVM commits

An initial implementation of an LLVMCodeView library has landed. This implements support for emitting debug info in the CodeView format. r256385.
lit has gained support for a per-test timeout which can be set using --timeout=. r256471.
All uses of edge wights in BranchProbabilityInfo have been replaced with probabilities. r256263.
The LLVM project documentation on patch reviews via Phabricator now has advice on choosing reviewers. r256265.
The gc.statepoint intrinsic's return type is now a token type rather than i32. r256443.

Clang commits

ASTtemplateKWAndArgsInfo and ASTTemplateArgumentListInfo have been converted to use the TrailingObjects header. This abstracts away reinterpret_cast, pointer arithmetic, and size calculations needed for the case where a class has some other objects appended to the end of it. r256359.

Other project commits

Development of LLD's new ELF linker is continuing, with support for new relocations on x86, x86-64, and MIPS. r256143, r256144, r256172, r256416.

LLVM Weekly - #103, Dec 21st 2015

Mon, 21 Dec 2015 11:42:00 +0000

Welcome to the one hundred and third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

Regular readers will know about lowRISC, a not-for-profit project a group of us founded aiming to produce a complete open-source System-on-Chip in volume. We've just hit a new milestone with the untethering of the base SoC. If you're interested in contributing, the blog post contains a number of potential starting points.

News and articles from around the web

The 6th EuroLLVM conference will be held on March 17th-18th in Barcelona, Spain. The call for papers is now open and will remain open until January 25th 2016. EuroLLVM CFP

Chandler Carruth's keynote, "Understanding compiler optimizations" from the Meeting C++ 2015 conference is now online.

Richard Pennington has blogged about bootstrapping LLVM and Clang using pre-compiled ELLCC binaries.

Bloomberg is going to be holding a weekend Clang and LLVM hackathon in NYC and in London on February 6th and 7th. The event will be open to everyone in the community and Bloomberg will provide space, power, food, beverages, and internet access.They're looking for experienced Clang and LLVM developers to help as mentors.

On the mailing lists

Marshall Clow proposes dropping support in libc++ for GCC versions prior to 4.7. Eric Fiselier suggests that actually GCC 4.7 and 4.8 have a rather large number of test failures and 4.9 would be a more sensible requirement.
Dmitry Polukhin has proposed an RFC on supporting GCC's ifunc attribute. He proposes three potential approaches and feedback so far prefers the second.
Easwaran Raman has posted an RFC on hotness thresholds in profile-guided optimisation. The proposal attempts to define a way of determining hot blocks that works both for programs with few hot-spots and a long tail of frequently executed blocks.

LLVM commits

LLVM IR now supports floating point atomic loads and stores. r255737.
New attributes have been introduced: InaccessibleMemOnly (a function may only access memory that is not accessible by the module being compiled) and InaccessibleMemOrArgMemOnly (a function may only access memory that is either not accessible by the module being compiled or is pointed to by its pointer arguments). r255778.
The PowerPC backend gained support for soft float operations on ppc32. r255516.
The terminatepad instruction has been removed from LLVM IR. r255522.
IR call instructions can now take a fast-math flags marker which indicates fast-math flags may allow otherwise unsafe optimisations. r255555.
LLVM gained a C++11 ThreadPool in its internal library. It is intended to be used for ThinLTO. r255593.
The default set of passes has been adjusted. mem2reg will not be run immediately after globalopt and more scalar optimization passes have been added to the LTO pipeline. r255634.
The llvm-profdata tool now supports specifying a weight when merging profile data. This can be used to give more relative importance to one of multiple profile runs. r255659.
For CMake builds, a compile_commands.json file will now be generated which tells tools like YouCompleteMe and clang_complete how to build each source file. r255789.
The Hexagon VLIW packetizer saw a large update (though unfortunately the changes aren't summarised in the commit message). r255807.
A number of LLVM's C APIs have been depreciated: LLVMParseBitcode, LLVMParseBitcodeInContext, LLVMGetBitcodeModuleInContext and LLVMGetBitcodeModule. These have been replaced with new versions of the functions which don't record a diagnostic. r256065.
The AVR backend (which is being imported incrementally) gained AVR.td and AVRRegisterInfo.td. r256120.

Clang commits

A new checker has been introduced to detect excess padding in classes and structs. r255545.
A new control-flow integrity mode was introduced, cross-DSO CFI allows control flow to be protected across shared objects. It is currently marked experimental. r255694.
Clang's CMake build system now supports generating profile data for Clang. r255740, r256069.

Other project commits

It is now possible to suppress reports from UndefinedBehaviourSanitizer for certain files, functions, or modules at runtime. r256018.
The llvm test-suite's CMake+Lit runner gained support for SPEC2000 and SPEC CPU95. r255876, r255878.

LLVM Weekly - #102, Dec 14th 2015

Mon, 14 Dec 2015 04:43:00 +0000

Welcome to the one hundred and second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Version 1.5 of the Rust programming language has been released. Rust of course uses LLVM as its backend.

George Balatsouras has written a blog post on compiling a project using autotools to LLVM bitcode.

On the mailing lists

Derek Schuff kicked off a discussion about whether virtual registers should be allowed after register allocation for targets with infinite virtual register sets. For targets such as WebAssembly and NVPTX, it of course doesn't make sense to have a fixed size register file. A number of people raised concerns that using virtual registers after register allocation seems like a hack that could result in difficult corner cases, or suggested that supporting infinite (or at least growable) physical register sets might be an interesting alternative. Matthias Braun gave a really good summary of the issues.
Discussion has continued on adding an HasInaccessibleState attribute. Vaivaswatha Nagaraj summarised the key points of the discussion so far while Joseph Tremoulet shared some thoughts based on his experience on the Microsoft Phoenix compiler.
Alexander Riccio is interested in feedback on his proposal to integrate more static analysis tests. He's looking to import code published by NIST.
Philip Reames has posted an RFC on extending atomic loads and stores to floating point and vector types. Feedback appears to be positive.
Hans Wennborg has proposed a schedule for the 3.8 release. Under this proposal, 3.8 would be branched on the 13th of January 2016 with a final release targeted for 18th February.
Craig Topper has provided a useful description of how patterns are ordered by TableGen.
David Li has posted an update detailing remaining steps for size reduction of profile-guided optimisation.
When writing your own backend, how should you handle checking the range of immediates for your assembly parser? Alex Bradbury explains how.

LLVM commits

A new minimum spanning tree based method of instrumenting code for profile-guided optimisation was added. This guarantees the minimum number of CFG edges are instrumented. r255132.
MatchBSwap in InstCombine will now also detect bit reversals. r255334.
Sample-based profile-guided optimisation memory usage has been reduced by 10x by changing from using a DenseMap for sample records to a std::map. r255389.
An Instruction::getFunction method was added. It's perhaps surprising this didn't exist before. r254975.
FP16 vector instructions defined in ARMv8.2-A are now supported. r255010.
The EarlyCSE (common subexpression elimination) pass learned to perform value forwarding for unordered atomics. r255054.
Debug info in LLVM IR can now refer to macros. r255245.
LLVM's developer policy has been updated to detail the currently accepted C API stability policy and other guidelines. r255300.
A massive rework of funclet-oriented exception handling (needed for Windows exceptions) has landed. r255422.

Clang commits

Clang gained an option to use the new ThinLTO pipeline. r254927.
Hexagon will use the integrated assembler by default. r255127.
dllexport and dllimport attributes are now exposed through the libclang API. r255273.

Other project commits

ThreadSanitizer gained initial support for PPC64. r255057.

LLVM Weekly - #101, Dec 7th 2015

Mon, 07 Dec 2015 03:52:00 +0000

Welcome to the one hundred and first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The implementation of the Swift programming language is now open source. Rather than being a simple code dump, development will now occur out in the open with external contributions encouraged. If you haven't already, now might be a good time to watch Joseph Groff and Chris Lattner's talk on the Swift Intermediate Language.

Rui Ueyama wrote about the new LLD ELF linker on the official LLVM blog.

The Visual C++ team have released Clang with Microsoft CodeGen. This uses the Clang parser along with the code generator and optimizer from Visual C++. The majority of the Clang and LLVM changes will be contributed back upstream.

Alex Denisov wrote about using the LLVM API with Swift.

If you haven't already submitted your talk proposal for the LLVM devroom at FOSDEM, you've now got a little more time. Get your submission in by this Friday.

On the mailing lists

Swift team members have started discussions about upstreaming their changes to LLVM, Clang, and LLDB. The Clang changes include the addition of an 'API notes' feature which has seen some interest from other developers. This can be used to associate certain attributes with functions from system headers through an external YAML file, which is of course much more pragmatic than expecting system headers on all supported platforms to be updated.
Vaivaswatha Nagaraj observes that malloc and realloc don't have the doesNotAccessMemory/onlyReadsMemory attributes set, and that this makes GlobalsAA much less effective. As was pointed out in the ensuing discussion, these attributes wouldn't be correct for malloc and realloc but it would perhaps make sense to add a new attribute. Vaivaswatha penned an RFC on an HasInaccessibleState attribute. This has generated a lot of discussion and alternate proposals but no conclusion yet.
Christof Douma has posted an RFC on adding execute only support to the ARM code generator. This means the compiler will not generate data accesses in to the code section.
Oliver Stannard has posted an RFC on supporting position-independent code on ARM for small embedded systems. In read-only position independence (ROPI), code and read-only data is accessed PC-relative with the offsets known at static link time. In read-write position independence (RWPI), read-write data is accessed relative to the static base register (R9).
This week's bikeshedding thread is on the naming convention for LLVM intrinsics. The proposal is to standardise on using . as a separator. There are some suggestions that _ be allowed in words.
How can you recompile functions at different optimisation levels? Lang Hames provides a sample doing this using the Orc API.

LLVM commits

llc and opt gained an option to run all passes twice. This is intended to help show up bugs that occur when using the same pass manager to compile multiple modules. r254774.
An initial prototype for llvm-dwp has been committed. This will eventually be a tool for building a DWARF package file out of a number of .dwo split debug files. r254355.
All weight-based interfaces in MachineBasicBlock have now been replaced with probability-based interfaces. r254377.
LLVM's STLExtras gained a range-based version of std::any_of and std::find. r254391, r254390.
llvm.get.dynamic.area.offset.{i32,264} intrinsics have been added. These can be used to get the address of the most recent dynamic alloca. r254404.
The X86 backend gained a new pass to reduce code size by removing redundant address recalculations for LEA. r254712.
The WebAssembly backend now has initial support for varargs. r254799.

Clang commits

Design docs have been added for forward-edge CFI for indirect calls. r254464.
The pass_object_size attribute was added to Clang. This intended to be used to work around cases where __builtin_object_size doesn't function. r254554.
Documentation was added for UndefinedBehaviorSanitizer. r254733.

Other project commits

LLD now supports the R_MIPS_HI16/LO16 relocations. r254461.
libomp can now make use of libhwloc on Unix to discover topology of the host system. r254320.

New ELF Linker from the LLVM Project

Mon, 30 Nov 2015 07:26:00 +0000

We have been working hard for a few months now to rewrite the ELF support in lld, the LLVM linker. We are happy to announce that it has reached a significant milestone: it is now able to bootstrap LLVM, Clang, and itself and pass all tests on x86-64 Linux and FreeBSD with the speed expected of an LLVM project.

ELF is the standard file format for executables on Unix-like systems, such as Linux and BSDs. GNU ld and GNU gold are commonly used linkers for such systems today. In many use cases, the linker is a black box for which only speed matters. Depending on program size, linking a program takes from tens of milliseconds to more than a minute. We designed the new linker so that it runs as fast as possible. Although no serious benchmarking or optimization has been conducted yet, it is consistently observed that the new lld links the LLVM/Clang/lld executables in about half the time of GNU gold. Generated executables are roughly the same size. lld is not at feature parity with gold yet, so it is too early to make a conclusion, but we are working hard to maintain or improve lldâ€™s speed while adding more features.

lld is command-line compatible with GNU ld so that it can be used as a drop-in replacement. This does not necessarily mean that we are implementing all the features of the GNU linkers in the same way as they did. Some features are no longer relevant for modern Unix-like systems and can be removed. Some other features can be implemented in more efficient ways than those in the traditional linkers. Writing a new linker from scratch is a rare occasion. We take advantage of this opportunity to simplify the linker while keeping compatibility with the existing linkers for normal use.

The new ELF linker is a relatively small program which currently consists of about 7000 lines of C++ code. It is based on the same design as the PE/COFF (Windows) support in lld, so the design document for the PE/COFF support is directly applicable to the ELF support.

The older ELF support still exists in lld repository in parallel with the new one. Please be careful to not confuse the two. They are separated at the top directory and do not share code. You can run the new linker with ld.lld command or by passing -fuse-ld=lld to Clang when linking.

We are still working on implementing remaining functionality such as improved linker script support or improved support for architectures beyond x86_64. If you are interested in the new linker, try it out for yourself.

LLVM Weekly - #100, Nov 30th 2015

Mon, 30 Nov 2015 04:41:00 +0000

Welcome to the one hundredth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Eagle-eyed readers will note we've now reached issue 100, marking 100 weeks of uninterrupted service and of course meaning there's just 28 weeks to go until an important numerical milestone.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

There is going to be an LLVM Devroom at FOSDEM next year and the call for proposals closes on December 1st. Get your submissions in!

Most slides from the recent LLVM in HPC workshop have now been posted.

Jeff Trull has posted a great blog post on fuzzing C++ code with AFL and libFuzzer.

On the mailing lists

The upcoming removal of the autoconf build system came up on the mailing list again. Chris Bieneman explains the policy on blocking vs non-blocking bugs for this. If you think you're likely to be affected, now is a very good time to kick the tires on CMake.
Geoffrey Romer is interested in adding the ability to customise the behaviour of std::hash and is looking for feedback.
Is it possible to use a static base register on ARM rather than PC-relative addressing? Oliver Stannard has a patch for this which should be upstreamed soon.
Rail Shafigulin is looking for information on how slots are assigned for packets in Hexagon. As usual, Krzystof Parzyszek provides some useful answers.

LLVM commits

A number of patches related to ARMv8.2-A have landed. Public documentation doesn't seem to have been released for this architecture revision, but the patches indicate some of the new features including: persistent memory instruction and FP16 instructions. You can see the patches still in review here. r254156, r254198.
A series of helper functions from SelectionDAGNodes have been exposed (isNullConstant, isNullFPConstant, isAllOnesConstant, isOneConstant). These helpers can help simplify code in your target's ISelLowering. r254085.
The WebAssembly backend's block placement algorithm has been improved. r253876.
Tests generated from utils/update_llc_test_checks.py are now marked as autogenerated. r253917.

Clang commits

DataRecursiveASTVisitor has been removed, and RecursiveASTVisitor can be used in its place. This resulted in the removal of 2912 lines of code. r253948.
Sparc and SparcV9 default to using an external assembler again. r254199
Functions with the interrupt attribute are now supported for mips32r2+. r254205.

Other project commits

A single DataFlowSanitizer or ThreadSanitizer-instrumented binary can now run on both 39-bit virtual address space and 42-bit virtual address space AArch64 platforms. r254151, r254197.
lldb gained a swig_bot.py for generating bindings. r254022.

LLVM Weekly - #99, Nov 23rd 2015

Mon, 23 Nov 2015 06:20:00 +0000

Welcome to the ninety-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLVM/Clang 3.7.1-rc2 has been tagged. As always, help testing is appreciated.

Clasp 0.4 has been released. Clasp is a new Common Lisp implementation that uses LLVM as a compiler backend and aims to offer seamless C++ interoperation.

On the mailing lists

Quentin Colombet has shared a plan for moving forwards with global instruction selection, as proposed in his Dev Meeting talk. There's a lot of enthusiasm for this work, though some questions about how in practical terms the development should proceed and be tested. There is also hope that this new work will allow the distinction between integers and pointers to be preserved through to MachineInstructions. This is useful both for GC and for architectures where pointers aren't integers.
Eric Christopher has shared a summary of discussions from the recent Birds of a Feather discussion on the LLVM C API. This includes proposed policy for stability guarantees and extending the APIs.
Ed Maste has been experimenting with linking the FreeBSD base system with lld. With a few extra patches he's managed to link the whole FreeBSD userland.
Artem Dergachev has shared some minutes from a call about summary-based inter-procedural analysis for Clang's static analyser.
Steve King is concerned about recent code size regressions with Os. The issue was bisected to recent changes to the heuristic for merging conditional stores. James Molloy, who authored the patch in question suggests more investigation is necessary.
Rail Shafigulin is working on a custom VLIW architecture and has had a number of questions about the DFAPacketizer. Krzysztof Parzyszek has provided useful answers each time - well worth a read of these threads if you're doing any work with VLIW or want to learn more about DFAPacketizer.
Nick Johnson pointed out an interesting potential bug in the LiveVariables pass. There haven't been any responses yet, but he has followed up with a patch to fix the issue.
Amjad Aboud has posted a detailed RFC on ensuring LLVM debug info supports all lexically scoped entities. He includes a simple example which shows where block-local typedefs or class definitions can lead to problems.

LLVM commits

Initial support for value profiling landed. r253484.
It is now possible to use the -force-attribute command-line option for specifying a function attribute for a particular function (e.g. norecurse, noinline etc). This should be very useful for testing. r253550.
The WebAssembly backend gained initial prototype passes for register coloring (on its virtual registers) and register stackifying. r253217, r253465.
The built-in assembler now treats fatal errors as non-fatal in order to report all errors in a file rather than just the first one encountered. r253328.
As discussed on the mailing list last week, lane masks are now always precise. r253279.
Support for prelinking has been dropped. See the commit message for a full rationale. r253280.
llvm-lto can now be used to emit assembly rather than object code. r253622, r253624.

Clang commits

Clang should now be usable for CUDA compilation out of the box. r253389.
When giving the -mcpu/-march options to Clang targeting ARM, you can now specify +feature. r253471.

Other project commits

Compiler-rt gained support for value profiling. r253483.
The 'new ELF linker' is now the default ELF linker in lld. r253318.
The LLVM test suite gained support for running SPEC2000int and SPEC2006int+fp with PGO and reference inputs. r253362.

LLVM Weekly - #98, Nov 16th 2015

Mon, 16 Nov 2015 04:18:00 +0000

Welcome to the ninety-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

This week's issue comes to you from Vienna where I'm just about to head home from a short break (so apologies if it's a little later than usual and perhaps a little less detailed). I'll admit that nobody has actually written in to beg that LLVM Weekly share travel tips, but I will say that Vienna is a beautiful city that's provided lots to do over the past few days. If you're visiting, I can strongly recommend Salm BrÃ¤u for good beer and food.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

All of the LLVM Dev Meeting Videos are now up, and will stay up. This includes Chris Lattner and Joseph Groff's talk on Swift's IR. You can also find most of the slides here. The folks at Quarkslab have also posted a trip report.

The big news this week is that code derived from NVIDIA's PGI Fortran compiler is to be open-sourced and a production-grade Fortran front-end to LLVM produced. This project is a collaboration between the US NNSA (National Nuclear Security Administration), NVIDIA, and the Lawrence Livermore, Sandia, and Los Alamos national laboratories. Hal Finkel has shared a little more on the LLVM mailing list. With a source code release not due for about another year, where does this leave the existing Flang efforts? The hope is that parts of Flang will be merged with the PGI release. Douglas Miles from the PGI team has also shared a mini-FAQ. Fortran announcement

Bjarne Stroustrup has shared a detailed trip report from the last C++ Standards Meeting.

This post over at the Include Security Blog delves in to some details of support for the SafeStack buffer overflow protection in LLVM.

At the official LLVM blog, a new post gives a very useful guide on how to reduce your testcases using bugpoint and custom scripts. As the post notes, bugpoint is a very powerful tool but can be difficult to use.

On the mailing lists

Do you maintain an out-of-tree target? Does your out-of-tree target have a huge number of subregisters and depends on imprecise lanemasks being available? If so, Matthias Braun wants to hear from you. Speak up now if the proposed change may affect you.
Geoff Berry is proposing some more work on devirtualization. In particular, he wants to propagate llvm.assume across function calls. He also asks what else is required to enable Clang's -fstrict-vtable-pointers by default, which Piotr Padlewski and Richard Smith provide detailed responses to.
Cong Hou has posted an RFC on adding a vector reduction add instruction to LLVM IR. There hasn't been much feedback yet, but David Li questions whether the effect could be modelled with simpler instructions/intrinsics.
Ben Langmuir has posted an RFC on whether modules specified in a module map file should shadow implicitly discovered modules.

LLVM commits

LLVM's autoconf-based build system is now officially deprecated, with the CMake build system being preferred. r252520.
Do you want to compile CUDA code with Clang and LLVM? There's now some handy documentation describing how to do so. See also Jingyue's talk from the recent LLVM Dev Meeting. r252660.
A simple MachineInstruction SSA pass for PowerPC has been added. The implementation is short and straight-forward, so worth a read if you want to do some MI-level peephole optimisations for your target. r252651.
Basic support for AArch64's address tagging has been added. In AArch64, the top 8 bits of an address can be used to store extra metadata with these bits being masked out before going through address translation. r252573.
The Hexagon backend now supports assembly parsing. r252443.
The CMake build system gained a new LLVMExternalProjectUtils module. As an example, this is used with the LLVM test suite which can be set up to be rebuilt whenever the in-tree clang or lld change. This could also be used with compiler-rt or libcxx. r252747.
An 'empty token' is now defined (written as token empty) for when using tokens in LLVM IR. r252811.
LibFuzzer gained a new experimental search heuristic, drill. As the comment in FuzzerLoop.cpp explains, this will 1) read+shuffle+execute+minimize the corpus, 2) choose a random unit, 3) reset the coverage, 4) start fuzzing as if the chosen unit was the only element of the corpus, 5) reset the coverage again when done, 6) merge the newly created corpus into the original one. r252838.
A BITREVERSE SelectionDAG node and a set of llvm.bitreverse.* intrinsics have been introduced. The intention is that backends should no longer have to reimplement similar code to match instruction patterns to their own ISA's bitreverse instruction. See also the patch to the ARM backend that replaces ARMISD::RBIT with ISD::BITREVERSE. r252878, r253047.

Clang commits

Support for __attribute__(internal_linkage) was added. This is much like C's static keyword, but applies to C++ class methods. r252648.
Clang now supports GCC's __auto_type extension, with a few minor enhancements. r252690.

Other project commits

libcxx gained initial support for building with mustl libc. Primarily this is a new CMake option, necessary as Musl doesn't provide a macro to indicate its presense. r252457).

Reduce Your Testcases with Bugpoint and Custom Scripts

Thu, 12 Nov 2015 20:18:00 +0000

LLVM provides many useful command line tools to handle bitcode: opt is the most widely known and is used to run individual passes on an IR module, and llc invokes the backend to generate an assembly or object file from an IR module. Less known but very powerful is bugpoint, the automatic test case reduction tool, that should be part of every developer's toolbox.

The bugpoint tool helps to reduce an input IR file while preserving some interesting behavior, usually a compiler crash or a miscompile. Multiple strategies are involved in the reduction of the test case (shuffling instructions, modifying the control flow, etc.), but because it is oblivious to the LLVM passes and the individual backend specificities, "it may appear to do stupid things or miss obvious simplifications", as stated in the official description. The documentation gives some insights on the strategies that can be involved by bugpoint, but the details are beyond the scope of this post.

Read on to learn how you can use the power of bugpoint to solve some non-obvious problems.

Bugpoint Interface Considered Harmful

Bugpoint is a powerful tool to reduce your test case, but its interface can lead to frustration (as stated in the documentation: "bugpoint can be a remarkably useful tool, but it sometimes works in non-obvious ways"). One of the main issue seems to be that bugpoint is ironically too advanced! It operates under three modes and switches automatically among them to solve different kind of problem: crash, miscompilation, or code generation (see the documentation for more information on these modes). However it is not always obvious to know beforehand which mode will be activated and which strategy bugpoint is actually using.

I found that for most of my uses, I don't want the advanced bugpoint features that deal with pass ordering for example, and I don't need bugpoint to detect which mode to operate and switch automatically. For most of my usage, the `compile-custom` option is perfectly adequate: similar to
`git bisect`, it allows you to provide a script to bugpoint. This script is a black box for bugpoint, it needs to accept a single argument (the bitcode file to process) and needs to return 0 if the bitcode does not exhibit the behavior you're interested in, or a non zero value in the other case. Bugpoint will apply multiple strategies in order to reduce the test case, and will call your custom script after each transformation to validate if the behavior you're looking for is still exhibited. The invocation for bugpoint is the following:

$ ./bin/bugpoint -compile-custom -compile-command=./check.sh -opt-command=./bin/opt my_test_case.ll

The important part is the two options -compile-custom and -compile-command=path_to_script.sh that indicate to bugpoint that it should use your own script to process the file. The other important part is the -opt-command option that should point to the correct opt that will be used to reduce the test case. Indeed by default bugpoint will search in the path for opt and may use an old system one that won't be able to process your IR properly, leading to some curious error message:

*** Debugging code generator crash!

Checking for crash with only these blocks:  diamond .preheader .lr.ph .end: error: Invalid type for value

simplifycfg failed!

Considering such a script `check.sh`, running it with your original test case this way:

$ ./check.sh my_test_case.ll && echo "NON-INTERESTING" || echo "INTERESTING"

should display INTERESTING before you try to use it with bugpoint, or you may very well be surprised. In fact bugpoint considers the script as a compile command. If you start with an NON-INTERESTING test case and feed it to bugpoint, it will assume that the code compiles correctly, and will try to assemble it, link it, and execute it to get a reference result. This is where bugpoint behavior can be confusing when it automatically switches mode, leaving the user with a confusing trace. A correct invocation should lead to a trace such as:

./bin/bugpoint  -compile-custom  -compile-command=./check.sh  -opt-command=./bin/opt slp.ll 

Read input file      : 'slp.ll'

*** All input ok

Initializing execution environment: Found command in: ./check.sh

Running the code generator to test for a crash: 

Error running tool:

  ./check.sh bugpoint-test-program-1aa0e1d.bc

*** Debugging code generator crash!

Checking to see if we can delete global inits: <crash>

*** Able to remove all global initializers!

Checking for crash with only these blocks:    .lr.ph6.preheader .preheader .lr.ph.preheader .lr.ph .backedge  ._crit_edge.loopexit... <11 total>: <crash>

Checking for crash with only these blocks: .preheader .backedge .lr.ph6.preheader: 

Checking for crash with only these blocks: .lr.ph ._crit_edge: 

...

...

Checking instruction:   store i8 %16, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1, !tbaa !2

*** Attempting to perform final cleanups: <crash>

Emitted bitcode to 'bugpoint-reduced-simplified.bc'

In practice the ability to write a custom script is very powerful, I will go over a few use cases I recently used bugpoint with.

Search For a String in the Output

I recently submitted a patch (http://reviews.llvm.org/D14364) for a case where the loop vectorizer didn't kick-in on a quite simple test case. After fixing the underlying issue I needed to submit a test with my patch. The original IR was a few hundred lines. Since I believe it is good practice to reduce test cases as much as possible, bugpoint is often my best friend. In this case the analysis result indicates "Memory dependences are safe with run-time checks" on the output after my patch.

Having compiled `opt` with and without my patch and copied each version in `/tmp/` I wrote this shell script:

#!/bin/bash

/tmp/opt.original -loop-accesses -analyze $1 | grep "Memory dependences are safe"

res_original=$?

/tmp/opt.patched -loop-accesses -analyze $1 | grep "Memory dependences are safe"

res_patched=$?

[[ $res_original == 1 && $res_patched == 0 ]] && exit 1

exit 0 

It first runs the bitcode supplied as argument to the script (the $1 above) through opt and uses grep to check for the presence of the expected string in the output. When grep exits, $? contains with 1 if the string is not present in the output. The reduced test case is valid if the original opt didn't produce the expected analysis but the new opt did.

Reduce While a Transformation Makes Effects

In another case (http://reviews.llvm.org/D13996), I patched the SLP vectorizer and I wanted to reduce the test case so that it didn't vectorize before my changes but vectorizes after:

#!/bin/bash

set -e

/tmp/opt.original -slp-vectorizer -S > /tmp/original.ll $1

/tmp/opt.patched -slp-vectorizer -S > /tmp/patched.ll $1

diff /tmp/original.ll /tmp/patched.ll && exit 0

exit 1

The use of a custom script offers flexibility and allows to run any complex logic to decide if a reduction is valid or not. I used it in the past to reduce crashes on a specific assertion and avoiding the reduction leading to a different crash, or to reduce for tracking instruction count regressions or any other metric.

Just Use FileCheck

LLVM comes with a Flexible pattern matching file verifier (FileCheck) that the tests are using intensively. You can annotate your original test case and write a script that reduce it for your patch. Let's take an example from the public LLVM repository with commit r252051 "[SimplifyCFG] Merge conditional stores". The associated test in the validation is test/Transforms/SimplifyCFG/merge-cond-stores.ll ; and it already contains all the check we need, let's try to reduce it. For this purpose you'll need to process one function at a time, or bugpoint may not produce what you expect: because the check will fail for one function, bugpoint can do any transformation to another function and the test would still be considered "interesting". Let's extract the function test_diamond_simple from the original file:

$ ./bin/llvm-extract -func=test_diamond_simple test/Transforms/SimplifyCFG/merge-cond-stores.ll -S > /tmp/my_test_case.ll

Then checkout and compile opt for revision r252050 and r252051, and copy them in /tmp/opt.r252050 and /tmp/opt.r252051. The check.sh script is then based on the CHECK line in the original test case:

#!/bin/bash

# Process the test before the patch and check with FileCheck,
# this is expected to fail.

/tmp/opt.r252050 -simplifycfg -instcombine -phi-node-folding-threshold=2 -S < $1 | ./bin/FileCheck merge-cons-stores.ll

original=$?

# Process the test after the patch and check with FileCheck,
# this is expected to succeed.

/tmp/opt.r252051 -simplifycfg -instcombine -phi-node-folding-threshold=2 -S < $1 | ./bin/FileCheck merge-cons-stores.ll

patched=$?

# The test is interesting if FileCheck failed before and
# succeed after the patch.

[[ $original != 0 && $patched == 0 ]] && exit 1

exit 0

I intentionally selected a very well written test to show you both the power of bugpoint and its limitation. If you look at the function we just extracted in my_test_case.ll for instance:

; CHECK-LABEL: @test_diamond_simple

; This should get if-converted.

; CHECK: store

; CHECK-NOT: store

; CHECK: ret

define i32 @test_diamond_simple(i32* %p, i32* %q, i32 %a, i32 %b) {

entry:

  %x1 = icmp eq i32 %a, 0

  br i1 %x1, label %no1, label %yes1

yes1:

  store i32 0, i32* %p

  br label %fallthrough

no1:

  %z1 = add i32 %a, %b

  br label %fallthrough

fallthrough:

  %z2 = phi i32 [ %z1, %no1 ], [ 0, %yes1 ]

  %x2 = icmp eq i32 %b, 0

  br i1 %x2, label %no2, label %yes2

yes2:

  store i32 1, i32* %p

  br label %end

no2:

  %z3 = sub i32 %z2, %b

  br label %end

end:

  %z4 = phi i32 [ %z3, %no2 ], [ 3, %yes2 ]

  ret i32 %z4

}

The transformation introduced in this patch allows to merge the stores in the true branches yes1 and yes2:

declare void @f()

define i32 @test_diamond_simple(i32* %p, i32* %q, i32 %a, i32 %b) {

entry:

  %x1 = icmp eq i32 %a, 0

  %z1 = add i32 %a, %b

  %z2 = select i1 %x1, i32 %z1, i32 0

  %x2 = icmp eq i32 %b, 0

  %z3 = sub i32 %z2, %b

  %z4 = select i1 %x2, i32 %z3, i32 3

  %0 = or i32 %a, %b

  %1 = icmp eq i32 %0, 0

  br i1 %1, label %3, label %2

; <label>:2 ; preds = %entry

  %simplifycfg.merge = select i1 %x2, i32 %z2, i32 1

  store i32 %simplifycfg.merge, i32* %p, align 4

  br label %3

; <label>:3 ; preds = %entry, %2

  ret i32 %z4

}

The original code seems pretty minimal, the variable and block names are explicit, it is easy to follow and you probably wouldn't think about reducing it. For the exercise, let's have a look at what bugpoint can do for us here:

define void @test_diamond_simple(i32* %p, i32 %b) {

entry:

  br i1 undef, label %fallthrough, label %yes1

yes1:                  ; preds = %entry

  store i32 0, i32* %p

  br label %fallthrough

fallthrough:           ; preds = %yes1, %entry

  %x2 = icmp eq i32 %b, 0

  br i1 %x2, label %end, label %yes2

yes2:                  ; preds = %fallthrough

  store i32 1, i32* %p

  br label %end

yes2:                  ; preds = %yes2, %fallthrough

  ret void

}

Bugpoint figured out that the no branches were useless for this test and removed them. The drawback is that bugpoint also has a tendency to introduce undef or unreachable here and there, which can make the test more fragile and harder to understand.

Not There Yet: Manual Cleanup

At the end of the reduction, the test is small but probably not ready to be submitted with your patch "as is". Some cleanup is probably still needed: for instance bugpoint won't convert invoke into calls, remove metadata, tbaa informations, personality function, etc. We also saw before that bugpoint can modify your test in unexpected way, adding undef or unreachable. Also you probably want to rename the variables to end up with a readable test case.

Fortunately, having the check.sh script at hand is helpful in this process, since you can just manually modify your test and run continuously the same command:

$ ./check.sh my_test_case.ll && echo "NON-INTERESTING" || echo "INTERESTING"

While the result is INTERESTING you know you keep having a valid test and you can continue to proceed with your cleanup.

Keep in mind that bugpoint can do far more, but hopefully this subset will be helpful to the ones that are still struggling with its command line options.

Finally, I'm grateful to Manman Ren for her review of this post.

LLVM Weekly - #97, Nov 9th 2015

Mon, 09 Nov 2015 08:28:00 +0000

Welcome to the ninety-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

A number of slide decks have started appearing from last week's LLVM Dev Meeting. The first set of videos made a brief appearance here but apparently they weren't ready for distribution and have been taken down again. In the mean time, you might be interested in the slides for: Living downstream without drowning, Automated performance tracking of LLVM-generated code, opaque pointer types, and debug info on a diet.

Pyston 0.4 has been released. It now features a baseline JIT in addition to the LLVM JIT.

The LLVM-based ELLCC cross-compilation tool chain has had a new release, version 0.1.18. This release has been tested for building the Linux kernel.

There is going to be an LLVM devroom at FOSDEM 2016. Check here for the call for papers and participation. The deadline for receiving submissions is December 1st.

On the mailing lists

As loyal LLVM Weekly readers will know, for a long time now there's been a movement to replace autoconf in the LLVM build system with CMake. It's now at the point where Chris Bieneman suggests we should consider deprecating autoconf. His proposal suggests it is marked deprecated for the 3.8 release and removed after 3.8 branches from the main development tree. This proposal is getting a lot of positive feedback.
After a discussion about the spotty use of the DEBUG_TYPE in passes to prefix debug messages, Daniel Berlin makes the suggestion that a new DEBUG_MSG macro be introduced which will always include the DEBUG_TYPE. Although there are a number of responses indicating how useful they find it when debug messages are prefixed with their source, there doesn't seem to yet be a consensus on whether it's worth replacing all DEBUG(dbgs() << ..) with something like this.
George Burgess is seeking feedback for his proposal on performing nullability analysis in Clang.
Richard Diamond has written a proposal on introducing an llvm.blackbox intrinsic with the purpose of explicitly preventing certain optimisations. So far, there's some confusion about exactly what this intrinsic would do, and whether there's an alternative way to achieve the same aims.
James Molloy proposes adding a new norecurse attribute. With no major exceptions, this has actually already been committed. See the commit summary below for more information.
David Blaikie is planning to implement an llvm-dwp tool to support building a DWARF package file out of a number of .dwo split debug files. He is seeking feedback on his plan.
Chris Bieneman has been improving support with the CMake build system for bootstrapping a cross-compiler toolchain and has run in to issues involving compiler-rt and bootstrapping builtins. There seems to be some support for the third of the proposed options, splitting the builtins and runtime libraries.

LLVM commits

A new optimisation was added to SimplifyCFG to merge conditional stores. The commit message notes it has little impact on the standard LLVM test suite, but it apparently causes many changes in a third party suite. r252051.
Implicit conversions between ilist iterators and pointers are now disabled. All in-tree code has been updated to use explicit conversions, but out-of-tree developers may need to either revert this patch for now or update their code. r252380.
The LoopLoadElimination pass was introduced, which can discover store-to-load forwarding opportunities. r251972, r252017.
Work on operand bundles continues with the addition of a data_operand abstraction. r252077.
LLVM gained portable helper macros for packed struct definitions. r252099.
DebugInfo has been modified so that a reference to a subprogram is stored in function-level metadata rather than subprograms containing a metadata reference to the function they describe. A script to update out-of-tree textual IR is attached here. r252219, r252268.
The norecurse attribute has been introduced. This indicates the function will never recurse into itself, either directly or indirectly, and can be used to demote global variables to locals. r252282.
The notail marker for call instructions was added, which prevents tail or musttail markers being added by the optimizer. r252368.

Clang commits

The idea of 'module file extensions' has been introduced. These add additional information to a module file that can be queried when it's read, allowing tools built on Clang to stash their own data in module files. See the original mailing list RFC for more details. r251955.
Clang now supports the __make_integer_seq template. __make_integer_seq<std::integer_sequence, int, 90000> takes 0.25 seconds while std::make_integer_sequence<int, 90000> takes so long the patch author didn't wait for it to finish. r252036.
The newly-introduced VforkChecker will look for unsafe code in a vforked process. r252285.

Other project commits

LLDB gained an initial Go expression parser. r251820.
compiler-rt now supports 32-bit mingw-w64. r251928.
Some initial documentation has been written on adding programming language support to LLDB. r251831.
LLDB should now be able to directly launch processors on the iOS simulator. r252112.

LLVM Weekly - #95, Oct 26th 2015

Mon, 26 Oct 2015 08:34:00 +0000

Welcome to the ninety-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The C++ Standardization Committee just finished up their most recent meeting, and STL (Stephan T. Lavavej) has provided a useful summary. Herb Sutter has also posted a trip report.

The HHVM team have posted an update on the status of LLVM code generation in HHVM. They managed to get LLVM to equal the performance of their custom backend, but are not going to deploy the LLVM backend to production for now. They're no longer actively working on the LLVM backend, but hope to ensure it doesn't regress.

Hal Finkel is proposing an LLVM social in Austin on the evening of November 15th. There should be a high density of LLVM users due to the LLVM in HPC workshop.

On the mailing lists

The biggest discussion this week in the LLVM community is the proposed change to the Apache license. One motivation is that some companies feel blocked from contributing due to the wording in the patent section of the LLVM developer policy, though see the linked message for a full summary. Concerns were raised about the fact that Apache 2 is incompatible with the GPLv2 and that license complexity may put off contributors, such as the FreeBSD community.
Robert Cox has posted an RFC on adding the ability for LLVM to produce an inlining report. This report would give details on where inlining has taken place and why. Initial feedback is positive.
More information for birds of a feather sessions at the upcoming LLVM dev meeting has gone out. Kristof Beyls is running one on performance tracking and benchmarking infrastructure, John Criswell on sophisticated program analysis on LLVM IR, and Paul Robinson on living downstream without drowning.
GaÃ«l Jobin has a fantastic answer to a question about handling intrinsics in your backend.
Louis Brandy is a Facebook employee who is starting to work on enabling Clang module support for their C++ codebase. He's interested in experiences from anyone on incrementally adding module maps to a large codebase.
A lot of work has been done to extend Clang's static analyzer to support interprocedural analysis. This thread discusses the current state and path forward. It's not giving the improvement expected (in terms of detected bugs-per-second) and the thread discusses thoughts on why.

LLVM commits

The TargetLowerBase::LibCall LegalizeAction has been introduced. This allows backends to control whether they prefer expansion or conversion to a libcall. r250826.
The Hexagon backend continues to accumulate sophisticated target-specific optimisations. HexagonBitSimplify continues a number of transformations to perform simplifications, redundant code elimination etc. r250868.
The new AliasAnalysis infrastructure gained an optional 'external' AA wrapper pass, to allow users to merge in external AA results. The unit test included in the patch gives a good example of how to use this. r250894.
CodeGenPrepare can now transform select instructions into branches and sink expensive operands. r250743.
Loop rotation can now use profile data in making decisions during MachineBlockPlacement. r250754.
ValueTracking now has a isKnownNonEqual predicate. r251012.

Clang commits

Basic (currently parsing and basic semantic analysis) support for the anticipated C++1z coroutine feature was added. r250980, r250985, r250993.
-fvisibility=internal is now aliased to -fvisibility=hidden, as LLVM doesn't currently support internal visibility. r250954.
Clang's static analyzer learnt to associate hashes with found issues. This hash aims to be resilient to code changes, so should be useful for suppressing false positives. r251011.

Other project commits

lld gained support for lazy relocations on x86-64. r250808.
The new LLD ELF linker now supports the --gc-sections parameter. This increases the time to link Clang by 8% bus reduces the size of the output binary by 4%. r251043.
LLDB gained a REPL. r250753, r250773.
DWARF parsing in LLDB can now be multi-threaded, which can drastically incrase the speed of loading debug info. r251106.

LLVM Weekly - #94, Oct 19th 2015

Mon, 19 Oct 2015 03:53:00 +0000

Welcome to the ninety-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

A good time was had by all at ORConf last week at CERN. We had over 100 open source hardware enthusiasts join us in Geneva. You can find my slides updating on lowRISC here. Videos should appear on youtube in the next week or so.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

A six month retrospective of LLILC, the project to produce an open source LLVM-based compiler for .NET, has been posted. It describes work still to be done for garbage collection and exception handling, code size and code quality, and JIT throughput.

The schedule for the 2015 LLVM Developers' Meeting is now available.

The new ELF linker in LLD is looking pretty fast. Right now it can link Clang in about half the time of binutils gold. However, the resulting binary is larger. It will be interesting to see how the performance compares when both are at feature parity, but this is looking promising.

On the mailing lists

Chandler Carruth, on behalf of the board of the LLVM Foundation has posted an RFC on introducing an LLVM Community Code of Conduct. The proposal is based on the Django code of conduct and generated masses of discussion. A couple of days later, Chandler posted a second draft incorporated feedback and answering many of the questions raised. The response appears to be good so far. I'll just highlight one of the questions and answers: "Q: Is this trying to change how the community behaves?" "A: I think the resounding answer is no, this is very much meant to formalize the existing extremely polite and respectful behavior that the LLVM community has had for many years."
There is going to be a birds of a feather section about the future of LLVM's C APIs at the upcoming LLVM developers' meeting, and Justin Bogner has helpfully shared some notes in preparation for this.
Philip Reames has shared some suggested topics for the managed languages birds of a feather meeting at the upcoming LLVM devmeeting. Joe Ranieri suggests some additional topics.
Chris Matthews has shared an RFC on adding background workers to LNT.
Diego Novillo is going to be hosting a birds of a feather on profile-guided optimisations at the upcoming dev meeting and has shared a preliminary list of topics for discussion.
Sanjoy Das has updated us on his work with operand bundles and gc transition arguments, and is seeking input and opinions on his suggested ways forward.
Zachary Turner has written about his efforts to support Python 3 with LLDB.
Evgenii Stepanov has posted an RFC on adding an internal linkage attribute. The message explains why setting always_inline and hidden symbol visibility is not enough.
Arch Robison initiated a discussion on extending the SLP vectorizer to work with tuples in Julia.

LLVM commits

Hexagon gained a new pass to merge adjacent stores. r250542.
Hexagon gained skeleton support for the 'HVX' extension instructions. r250600.
The loop vectorizer will now shrink integer operations into the smallest type possible. r250032.
Documentation has been added for binary sample profile encoding. r250309.
RewriteStatpointsForGC is starting to make use of operand bundles. r250489.

Clang commits

Clang gained support for the -fdebug-prefix-map= option as in GCC. r250094.
The PS4 toolchain definition has been added to Clang. r250262.
Clang now understands -flto=thin. r250398.

Other project commits

The libc++ testing guide has been updated. r250323.
LLD got even faster at linking clang. r250315.
LLDB gained preliminary NetBSD support. r250146.

LLVM Weekly - #93, Oct 12th 2015

Mon, 12 Oct 2015 11:14:00 +0000

Welcome to the ninety-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Apologies that this week's issue comes rather late in the day, my laptop gave up the ghost over the weekend while I was travelling leaving me with no way to write it. Now I'm back, I've managed to dust off an old desktop from my closet to write this issue (and to keep my unbroken streak). LLVM Weekly has been sent out every Monday since it started on the first Monday of January 2014. This weekend I was talking about lowRISC at ORConf 2015. You can find my slides here. There was a wide array of talks on open source hardware, including many on lowRISC and RISC-V. The videos should hopefully be posted in the next week or so.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The LLVM project has hit 250,000 commits. The commit that managed to hit this milestone was this one-liner.

A new paper by Bjarne Stroustrup, Herb Sutter, and Gabriel Dos Reis gives more details on their plans for memory safety in C++.

Videos from CppCon2015 are being posted to Youtube.

On the mailing lists

Ed Maste is taking a look at the feasibility of using the new LLD ELF linker for FreeBSD, and has shared his initial findings.
When is addrspacecast needed? Both David Chisnall and Mats Petersson have good answers.
Vedant Kumar has posted an RFC on cleaning up the way optional Function data is stored.
Dehao Chen has shared an update on the AutoFDO project, which allows the use of a perf.data profile for profile-guided optimisation. Clang built with -fprofile-sample-use is about 10% faster than Clang built with -O3.
Larisse Voufo has shared a proposal for optimizing const C++ objects in LLVM.
Chris Matthews has announced the open-sourcing of an llvm bisect tool, for bisecting bugs using prebuilt LLVM and Clang revisions.

LLVM commits

The Hexagon architecture gained an early if-conversion pass. r249423.
ThinLTO has started to land, in particular support for function summary index bitcode sections and files. r249270.
Codegen for ARM's memcpy intrinsic has been modified to make better use of LDM/STM. r249322.
The llvm.eh.exceptioncode intrinsic was added. r249492.
It is now possible to turn off MMX support without disabling SSE. r249731.

Clang commits

The policy for adding new style options to clang-format has been documented. r249289.
The libclang bindings have been extended with accessors for C++ function attributes (pure virtual, virtual, or const). r250008.

Other project commits

GoLanguageRuntime was introduced to LLDB, which supports finding the runtime type for Go interfaces. r249456, r249459.
The new LLD ELF linker now supports the --as-needed option. r249998.
LLDB for MIPS is now able to emulate microMIPS instructions. r249381.
liblldb is working towards being able to work under both Python 2.x and 3.x. r249886.

LLVM Weekly - #92, Oct 5th 2015

Mon, 05 Oct 2015 05:32:00 +0000

Welcome to the ninety-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Most of the presentation materials from CppCon2015 are now online. Talks that may be of particular interest include Kostya Serebryany on fuzzing with libFuzzer, Piotr Padlewski on C++ devirtualization in Clang, and JF Bastien talking about C++ on the web.

Rafael EspÃndola wrote in to share an impressive milestone for the new LLD ELF linker. It can now link itself and all of LLVM and Clang (though not all tests pass, and you must use LLVM_ENABLE_THREADS=OFF). Things will of course get really interesting once LLD matures if it can compete with Gold in terms of speed.

The next Paris LLVM social will take place on October 15th. Calixte Denizet will be talking about Scilab's usage of LLVM.

On the mailing lists

David Li has posted an update on efforts to reduce the overhead of profile-guided optimisation size overhead. He's produced an initial implementation of one of the proposals, which reduces the size of a release clang binary with coverage mapping from 986MB to 569MB.
It's a new month, so time for a new update on CMake's ability to replace autoconf in LLVM. As Chris says, we're getting very close now.
Tom Stellard has posted a proposed release schedule for LLVM/Clang 3.7.x point releases. The deadline to propose patches for 3.7.1 is November 2nd and November 30th for 3.7.2. Tom is also asking that people nominate patches using Phabricator rather than email.
Renato has kicked off a discussion on buildbot noise with a good summary of the issues and potential ways forward.
Chris Matthews is looking for feedback on how people are using orders in LNT, so as to better understand how to improve things. As he explains, the 'Order' is the SVN revision of the compiler.
Jeroen Ketema is asking if anyone has any objections to a change to ARM's NEON vld and vst intrinsics. The change would allow an address space to be associated with the pointer these intrinsics take.
Jonas Paulsson is curious about how to control selection of two-address vs three-address instruction forms. Several responses suggest just allowing three-address forms to be selected and have a late pass that converts to the two-address form where possible. Jonas has an interesting followup questions as to whether the register allocator will produce the maximum number of opportunities for this conversion.

LLVM commits

A scheduler for the MIPS P5600 processor landed. r248725.
Align metadata for the load instruction was introduced. r248721.
Support for windows exception handling continues with support in AsmPrinter for 'funclets'. r248824.
Support landed for the HHVM JIT calling convention. r248832.

Clang commits

clang-format's #include sorting functionality has been extended. r248782.

Other project commits

The new ELF linker gained initial support for MIPS. r248779.
Some basic linker script support was added to the new ELF linker, enough to parse Linux's libc.so. r248918.
.ARM.exidx and .ARM.extab unwind information is now supported by lldb. r248903.

LLVM Weekly - #91, Sep 28th 2015

Mon, 28 Sep 2015 04:16:00 +0000

Welcome to the ninety-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Some slides and videos from cppcon have started to appear. See the slides from Bjarne Stroustrup's keynote (which focus on the Core Guidelines project), as well as Herb Sutter's slides on supporting ownership annotations in C++ (this will be familiar to anyone who has used Rust). Videos are starting to appear on the CppCon Youtube channel.

A Microsoft blog says something about their plans to rejuvenate MSVC. "We will continue to invest in improving our compiler with a goal towards making it fully standards compliant".

On the mailing lists

Jason Kim has posted a summary of a recent in-person meeting about ThreadSanitizer, Android, and AARch64.
Chris Matthews is planning to work on improving performance change tracking in LNT, and has posted an RFC on his plans.
The discussion/debate in the 'trouble with triples' thread has rumbled on. There's been a lot of in-depth discussion, which I'm afraid I don't have the time to study and try to summarise fairly. Do dive in if this topic is important to you. Otherwise, I hope we'll see a summary/RFC once a way forward is found.
Jia-Ju Bai is looking for a way to extract basic loop information from LLVM IR. There's a good range of suggested starting points: LoopInfoWrapperPass, ScalarEvolution, InductiveRangeCheckElimination, and Polly.
Wolfgang Pieb is interested in extending the liveness of the 'this' pointer for a better debugging experience. His question is whether there's a better way of doing this than creating a new 'fake use' intrinsic. Kevin Smith suggests always storing the 'this' pointer in memory, but Wolfgang is concerned about the overhead of this approach.

LLVM commits

The AArch64 machine reassociation code has been refactored to be target-independent. r248164.
LLVM's SafeStack now supports Android. r248405.
A new target hook has been added for optimizing register copies. r248478.
Operand bundles are now supported for CallInst and InvokeInst. Initial support was also added to LLVM bitcode. r248527, r248551.

Clang commits

The iOS/OSX localizability checks have been improved. r248350.
Some more PS4 toolchain code landed. r248546.

Other project commits

The new ELF linker should now be able to create binaries for FreeBSD. r248554.
The new ELF linker gained initial AArch64 support. r248641.

LLVM Weekly - #90, Sep 21st 2015

Mon, 21 Sep 2015 05:10:00 +0000

Welcome to the ninetieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The ISO C++ committee have started putting together a set of C++ Core Guidelines. The document describes itself as a set of guidelines for using C++ well, with the intention that adherence to the rules could be checked by an analysis tool. Bjarne Stroustrup and Herb Sutter are acting as editors for this project.

A reddit user has posted a detailed description of how they use libclang to generate reflection data for C++.

Andrew Chambers has written a blog post about his use of fuzzing to look for ABI bugs.

This short and sweet blog post introduces the clazy static checker, a simple checker for some common suboptimal uses of Qt types. There are plenty of ideas in the comments for further analyses that might be useful.

On the mailing lists

The discussion on 'the trouble with triples' has resumed. Both Daniel Sanders and Renato Golin give examples of the kind of problems they're dealing with (yet again, naming things proves to be one of the great challenges in CS).
Escha has been looking at optimising passes using 'side-data'. This might mean e.g. making use of a spare bit in Value to indicate liveness. The question is whether this is something we should be looking to do in LLVM. Daniel Berlin comments that optimising these kinds of cases would be useful in the GVN rewrite. Chris Lattner follows up with a sketch of how manipulation of a marker bit might be exposed.

LLVM commits

Assert builds will now produce human-readable numbers to identify dumped SelectionDAG nodes. "0x7fcbd9700160: ch = EntryToken" becomes "t0: ch = EntryToken". r248010.
Basic support for reading GCC AutoFDO profiles has landed. r247874.
The llvm-mc-fuzzer tool has been documented. r247979.
The llvm.invariant.group.barrier intrinsic was born. r247711.
The LLVM default target triple can now be set to the empty string at configure time. r247775.

Clang commits

AST matcher functions have been renamed to match the AST node names directly. This is a breaking change. r247885, r247887.
The static analyzer gained a new Objective-C checker. DynamicTypeChecker will check for cases where the dynamic and static type of an object are unrelated. r248002.

Other project commits

The LLD COFF linker has gained some extra parallelisation. Self-link time has now improved from 1022ms to 654ms. r248038, r248078.
Support code was added to LLDB for recognising and printing Go types. r247629.
MemorySanitizer has been enabled for AArch64. r247809.

LLVM Weekly - #89, Sep 14th 2015

Mon, 14 Sep 2015 08:03:00 +0000

Welcome to the eighty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

I didn't spot any new LLVM-related articles or news this week. As a reminder, I always welcome tips either via email or Twitter. Seeing as there's nothing new, now seems a good time to point you towards either Stephen Diehl's tutorial on implementing a JIT compiled language with Haskell and LLVM or Adrian Sampson's 'LLVM for grad students'.

On the mailing lists

James Knight is proposing to deprecate and remove the old SelectionDAG scheduler, given that machine schedulers are now the preferred approach. He notes that a number of in-tree targets still use the SelectionDAG scheduler. It seems there is support for this plan.
Jauhien is curious about the availability of a C API for the ORC JIT, with the motivating use case here being to provide a binding for Rust. The main concern is that the ORC API is not yet stable, meaning it's not feasible to provide stable C bindings. The proposal is they live in llvm/include/llvm-c/unstable.
Joseph Tremoulet has a whole bunch of questions about addrspacecast semantics, and Chandler Carruth has a whole bunch of answers.
David Chisnall has a useful response to a question about implementing LLVM intrinsics with multiple return values. As he points out, this is usually done by returning a struct.

LLVM commits

A major modification of LLVM'a alias analysis manager has landed in order to port it to the new pass manager. See the commit message for full details. r247167.
The scalar replacement of aggregates (SROA) pass has been ported to the new pass manager. In the commit message, Chandler comments he hopes this serves as a good example of porting a transformation pass with non-trivial state to the new pass manager. r247501.
The GlobalsModRef alias analysis pass is now enabled by default. r247264.
Emacs users, rest your aching pinky fingers for a moment and rejoice. A range of improvements for the Emacs LLVM IR mode have landed. r247281.
The AArch64 backend can now select STNP, the non-temporal store instruction (this hints that the value need not be kept in cache). r247231.
Shrink wrapping optimisations are enabled on PPC64. r247237.
A whole bunch of StringRef functions have been sprinkled with the ALWAYS_INLINE attribute so as to reduce the overhead of string operations even on debug LLVM builds. Chandler has also been making other changes to improve the performance of check-llvm with a debug build. r247253.
The LLVM performance tips document has been extended to detail the use of allocas and when to specify alignment. r247301.
The hasLoadLinkedStoreConditional TargetLoweringInformation callback has now been split in to bool shouldExpandAtomicCmpXchgInIR(inst) and AtomicExpansionKind shouldExpandAtomicLoadInIR(inst). r247429.

Clang commits

A new control-flow integrity variant has been introduced, indirect function call chacking (enabled with -fsanitize=cfi-icall). This checks the dynamic type of the called function matches the static type used at the call. r247238.
A new -analyzer-config option is available to modify the size of function that the inliner considers as large. r247463.
Clang will now try much harder to preserve alignment information during IR-generation. r246985.
The __builtin_nontemporal_store and __builtin_nontemporal_load builtins have been introduced. r247104, r247374.

Other project commits

libcxx gained implementations of Boyer-Moore and Boyer-Moore-Horspool searchers (for the language fundamentals technical specification). r247036.
A trivial dynamic program linked with the new ELF lld now works with musl's dynamic linker. r247290.
LLD's COFF linker learned to merge cyclic graphs, which means self-linking now produces a 27.7MB rather than a 29.0MB executable. MSVC manages to produce a 27.1MB executable, so there is still room for improvement. r247387.

LLVM Weekly - #88, Sep 7th 2015

Mon, 07 Sep 2015 10:57:00 +0000

Welcome to the eighty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The biggest news from the past week is of course the release of LLVM and Clang 3.7. See the LLVM release notes and the Clang release notes for more details.

Slides from the 2015 GNU Tools Cauldron are now available online.

Version 1.12 of TCE, the TTA-based co-design environment has been released.

On the mailing lists

David Li has posted an RFC on reducing the size overhead for profile-guided optimisation. He observes that right now, Clang's PGO instrumentation increases binary size by 4.6X compared to 2.8X or GCC.
The Hip-Hop Virtual Machine team (a JITting virtual machine for PHP and Hack) have been looking at utilising LLVM as a backend. As part of this work, they've generated a number of patches that they're now looking to upstream. Sanjoy and Philip (Azul) have volunteered to help review the patches. With active work on Java (Azul), MSIL/C# (Microsoft), Python (Dropbox) and now PHP/Hack (Facebook) there seems to be a growing number of teams looking at improving LLVM when used for optimising higher-level languages.
Dylan McKay has been maintaining and developing an AVR backend for LLVM out of tree, and is now interested in merging it upstream.
Steve King is proposing a new LoopExitValues pass which aims to remove recomputations of loop exit values. This follow-up message perhaps makes it clearer what the pass does.
Teresa Johnson continues to work on ThinLTO and has now shared a revamped ThinLTO file format document. She's also created this handy website to track the current RFCs and patches for ThinLTO.
John Regehr has shared some results from Souper that point to areas where computeKnownBits could be improved.
I missed this last week, but Ben Craig has been looking at ways to improve the speed of Clang's static analyzer. Ted Kremenek gives some useful general guidance.

LLVM commits

The LLVM plugin for the gold linker now supports parallel LTO code generation. r246584.
The 'unpredictable' metadata annotation is now supported. This can be used to signal that a branch or switch is unpredictable. r246888.
A tool built on libFuzzer to fuzz llvm-as has been added. r246458.
The FunctionAttrs pass learned to infer nonnull attributes on returns. r246476.
Work on Windows exception handling continues with the addition of the cleanupendpad instruction and the llvm.eh.exceptionpointer intrinsic. r246751, r246752.

Clang commits

Basic support for the WebAssembly target landed in Clang. Basic codegen is supported, but not yet assembling or linking. r246814.
Clang will now warn when you reference object members from a handler of a constructor/destructor function-try-block. r246548.
Clang learnt the __builtin_unpredictable builtin, which will generate the newly added unpredictable metadata. r246699.

Other project commits

The new ELF lld linker gained basic archive file support. r246886.
Language plugins in LLDB can now provide data formatters. r246568.

LLVM 3.7 Release

Tue, 01 Sep 2015 14:15:00 +0000

It is my pleasure to announce that LLVM 3.7.0 is now available!

Get it here: http://llvm.org/releases/

This release contains the work of the LLVM community over the past six months: full OpenMP 3.1 support (behind a flag), the On Request Compilation (ORC) JIT API, a new backend for Berkeley Packet Filter (BPF), Control Flow Integrity checking, as well as improved optimizations, new Clang warnings, many bug fixes, and more.

For details on what's new, see the release notes [LLVM, Clang].

Many thanks to everyone who helped with testing, fixing, and getting the release into a good state!

Special thanks to the volunteer release builders and testers, without whom this release would not be possible: Dimitry Andric, Sebastian DreÃŸler, Renato Golin, Pavel Labath, Sylvestre Ledru, Ed Maste, Ben Pope, Daniel Sanders, and Nikola SmiljaniÄ‡!

If you have any questions or comments about this release, please contact the community on the mailing lists. Onwards to 3.8!

LLVM Weekly - #87, Aug 31st 2015

Mon, 31 Aug 2015 05:03:00 +0000

Welcome to the eighty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

It's a bank holiday weekend here in the UK, so apologies that you're reading this a few hours later than usual. As a quick reminder, if you're able to be in Geneva for the 9th-11th October then you should definitely come along to ORConf.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

At the time of writing, LLVM 3.7.0 has not yet seen its official release, but it has been tagged. The final release should be out within the next day or so. Congratulations to everyone involved.

The deadline for submissions to the LLVM in HPC workshop has been extended to Friday, September 4th.

Save the date! The next Cambridge LLVM Social will be on Wednesday 30th September.

On the mailing lists

Jingyue Wu has shared a design doc on NVPTX memory space inference and on straight-line scalar optimisations.
An RFC from Swaroop Sridhar on extending alloca to allow the specification of the address space of the allocation has resulted in a lot of discussion and is well worth a read for those interested in the intricacies of supporting GC in LLVM. Or particular interest is this design doc from Philip Reames on the interaction of stack-based allocation and GC in LLVM. See also this summary from Joseph Tremoulet of his understanding of how LLILC is hoping to use address spaces.
There's been some discussion on flaky buildbots. Renato details his concerns with simply ignoring these bots. Some of the issues with the bots in question may be due to the use of incremental builds.

LLVM commits

The 'kaleidoscope' tutorial has seen a major update. Now, rather than introducing MCJIT it describes how to use ORC, building a custom JIT called KaleidoscopeJIT. r246002.
WebAssembly backend implementation work has been continuing over the past few weeks. The individual commits tend to be small and focused (as good commits should be). I mainly wanted to include a mention to highlight how work is ongoing. e.g. this recent commit added support for emitting simple call s-expressions. r245986.
The documentation on statepoints now has more to say about base pointers and related assumptions and optimisations. r246103.
Constant propagation is enabled for more single precisions math functions, such as acosf, powf, logf. r246194.
The function llvm::splitCodeGen has been introduced in order to support the implementation of parallel LTO code generation. It uses SplitModule to split the module in to linkable partitions that are distributed among threads to be codegenned. r246236.
There's been another change to DebugInfo. DISubprogram definitions must now be marked as distinct. The commit message includes a suggested script for updating IR. r246327.
Chandler has been doing some refactoring of the ARM target parsing code with the hope of making it more efficient. He's reduced the cases where the code is called, which has a noticeable effect on some LLVM timings (e.g. check-llvm with non-optimized builds is 15 seconds faster). r246370, r246378.

Clang commits

A NullabilityChecker has been introduced, which is designed to catch a number of nullability-related issues. r246105.

Other project commits

ThreadSanitizer is now enabled for AArch64 with 42-bit virtual addressing on Linux. r246330.
libcxx now contains release goals for 3.8 in its TODO.txt. This includes the Filesystem TS and the asynchronous I/O TS. r245864.
LLD's ELF linker gained a basic AMDGPU ReaderWriter that enables it to emit binaries that can be consumed by the HSA runtime. r246155.
LLD's COFF linker gained support for parallel LTO code generation. r246342.
LLDB now supports hardware watchpoints on ARM. r245961.
The concept of 'language plugins' was introduced to LLDB. These will provide language-specific data formatters or expression evaluation. r246212.

LLVM Weekly - #86, Aug 24th 2015

Mon, 24 Aug 2015 02:56:00 +0000

Welcome to the eighty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The LLVM Foundation has been granted 501(c)(3) non-profit status. This means contributions are tax-deductible for US tax payers.

LLVM 3.7-rc3 has been tagged. This is the final release candidate and 3.7.0 final is expected very shortly.

The paper Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers makes use of Clang and LLVM as part of its verification flow.

Good news everyone! The deadline for submissions for the 2015 LLVM Developers' meeting has been extended to August the 25th.

On the mailing lists

Edward Jones and Simon Cook at Embecosm have been developing an LLVM backend for AAP, a 16-bit architecture which aims to be representative of common deeply embedded microprocessors. They're looking for feedback on upstreaming it. The architecture reference can be found here.
Alex Lorenz has shared an update on the status of Machine IR serialization. There's also the beginnings of a reference manual for it, which covers the syntax and how to use it in tests.
'deadal nix' has posted an RFC on supporting load and store for large aggregates. The author gives more detail here.
Lang Hames proposes two changes to the llvm.memcpy and llvm.memmove intrinsics. It seems that based on feedback, the plans is to add alignment information via metadata.
Evgenii Stepanav has an RFC on a minor fix to codegen for AlwaysInline functions. Right now, if an alwaysinline function is only called from dead code it may not be inlined.
Rong Xu has shared some interesting numbers on his investigations regarding front-end vs middle-end instrumentation of binaries.
Renato Golin raises concerns about the introduction of 'hacks' in to LLVM. In this particular instance, it's a patch for TheadSanitizer on AArch64 Android. He elaborated on his concerns here and here.

LLVM commits

TransformUtils gained the module splitter, which splits a module into linkable partitions and is intended to be used for parallel LTO code generation. r245662.
MergeFunctions is now closer to being deterministic. r245762.
ScalarEvolution has been ported to the new pass manager. r245193.
The 'kaleidoscope' tutorials on creating a language backend using LLVM are now partially updated to use C++11 features and idioms. r245322.
The peephole optimiser learned to look through PHIs to find additional register sources. r245479.

Clang commits

The ObjCGenericsChecker will catch type errors related to lightweight generics in Objective-C. r245646.

Other project commits

compiler-rt has gained implementations of some of the missing ARM EABI runtime functions. r245648.
libcxx gained a whole bunch of Sphinx-based documentation. r245788.

LLVM Foundation Granted 501(c)(3) Nonprofit Status

Thu, 20 Aug 2015 08:16:00 +0000

The LLVM Foundation is proud to announce it has been officially approved as a public charity with tax-exempt status under Section 501(c)(3) of the United States Internal Revenue Code. Contributions donated to the LLVM Foundation are fully tax deductible, retroactive to the organization establishment date of May 5, 2014.

The LLVM Foundation's primary mission is to provide accessible and informative educational tools for the LLVM Project and compiler technology to the general public. These educational tools include events such as the annual LLVM Developers' meeting. The LLVM Foundation also gives grants or scholarships to other nonprofit organizations and individuals (such as student travel to LLVM Foundation events). Lastly, the LLVM Foundation funds the infrastructure necessary to support the LLVM Project.

We hope to begin accepting donations online before the end of the year. If you are interested in contributing, please contact your employer: they may be willing to match contributions to a 501(c)(3) charity.

The IRS letter granting the LLVM Foundation 501(c)(3) tax-exempt status is available upon request until we can get it posted online. Please contact Tanya Lattner ([email protected]), President of the LLVM Foundation.

LLVM Weekly - #85, Aug 17th 2015

Mon, 17 Aug 2015 03:22:00 +0000

Welcome to the eighty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

If you're interested in open source hardware, lowRISC, RISC-V, OpenRISC, and more then consider joining us at ORConf 2015 in October. I'm also looking for talk submissions.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Videos from April's EuroLLVM are now online.

The deadline for the 2015 LLVM Developer's Meeting call for papers is rapidly approaching. Get your proposal in by August 20th, 11:59PM PDT.

A new paper covering the AST generation techniques used in Polly in great detail has been published in the July issue of TOPLAS. You can read the preprint here.

The Customizable Naming Convention Checker (CNCC) is a new Clang-based tool that can be used to validate class, field, variable, and namespace naming conventions against a chosen regular expression.

EvilML is a deliciously terrifying compiler that compiles ML to C++ template language.

On the mailing lists

With the 3.7 release on its way, it's important to fix up the release notes so they reflect the work that's been done over the past six months.
Dylan MacKay proposes making the existing 'expand' action of the instruction selection legalizer into split and expand. Part of the motivation for this is the author's work on an AVR backend.
Wang Nan has posted an RFC on adding an llvm.typeid.for intrinsic. This would be used with the BFP backend to specify the type for buffers passed to perf.
Teresa Johnson has posted an update on her ThinLTO work.
Peter Collingbourne has proposed a simple approach to parallelising parallel codegen for link-time optimisation. This gives a speedup on a HP Z620 from 15m20s to 8m06s when 4 partitions are used. Speedup beyond that is limited (partially due to Amdahl's law).
Sanjoy Day has shared an RFC for adding operand bundles to call and invokes. These would be used to help track state required for deoptimization, for instance by attaching a 'deopt' operand bundle to relevant calls which contans the cstate of the abstract virtual machine.

LLVM commits

MergeFunctions has been sped up substantially by hashing functions and comparing that hash before performing a full comparison. This results in a speedup of 46% for MergeFunctions in libxul and 117% for Chromium. r245140.
i64 loads and stores are now supported for 32-bit SPARC. This is a little fiddly to support as the LDD/STD instructions need a consecutive even/odd pair of 32-bit registers. r244484.
Machine basic blocks are now serialized using custom syntax rather than YAML. A later commit documented this syntax. r244982, r245138.
A new TargetTransformInfo hook has been added for specifying per-target defaults for interleaved accesses. r244449.
The llvm.loop.unroll.enable metadata was introduced. This will cause a loop to be unrolled fully if the trip count is known at compiler time and partially if it isn't (unlike llvm.loop.unroll.full which won't unroll a loop if the trip count isn't known). r244466.
Rudimentary support for the new Windows exception handling instructions has been introduced. r244558.
Token types have been added to LLVM IR. r245029.
The BPF backend gained documentation and an instruction set description. r245105.

Clang commits

The WebKit brace style is now supported by clang-format. r244446.

Other project commits

Statistics collection in the OpenMP runtime has been tidied up and expanded. r244677.

LLVM Weekly - #84, Aug 10th 2015

Mon, 10 Aug 2015 05:36:00 +0000

Welcome to the eighty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Adrian Sampson has written a fantastic introduction to LLVM. It's titled LLVM for Grad Students, but it should be useful for anybody looking to use LLVM or just wanting to understand it better.

Brandon Holt has written up a short and helpful post giving hints and tips on debugging LLVM.

The move of the mailing lists from UIUC on to lists.llvm.org is now complete. All public LLVM-related mailing lists are shown here. List addresses have now changed to [email protected].

There's been some exciting activity in the world of GCC. Support for the draft C++ Concepts TS has been committed. A draft of the technical specification is available here. Additionally, Nick Clifton has posted a useful summary of GNU toolchain developments for July/August.

On the mailing lists

Rong Xu has shared an RFC for late instrumentation in LLVM. The RFC describes (and quantifies) the performance cost of inserting instrumentation for profile-guided optimisation in the frontend and proposes approaches for adding instrumentation in the middle-end instead.
Jingyue Wu kicked off a discussion about modifying the BasicAA alias analysis to understand whether there is aliasing between different address spaces.
Teresa Johnson has posted RFCs for the ThinLTO file API and data structures and the ThinLTO file format.
Chandler Carruth is looking to enable the GlobalModsRef in the default pass pipeline. He's found it performance neutral in his tests so far, but would appreciate more people to try and benchmark it.
Asking for advice on how to get started with LLVM is very common. Few take as much time on detailing their background, motivations, and proposed plan as Arno Bastenhof in a recent message. Probably because of this, his request has attracted some very high quality replies.

LLVM commits

A handy new LLVM Support header was introduced. The TrailingObjects template class abstracts away reinterpret_cast, pointer arithmetic, and size calculation needed for the case where a class has some other objects appended to the end of it. r244164.
Initial documentation for the Machine IR serialization format has been written. r244292.
Uniquable DICompilerUnits have been disallowed. Old bitcode will be automatically upgraded and the sed script in the commit message should be useful for updating out-of-tree testcases. r243885.
All of the TargetTransformInfo cost APIs now use int rather than unsigned. r244080.

Clang commits

A new checker for code-level localizability issues on OSX/iOS was born. It will warn about the use of non-localized NSStrings passed to UI methods and about failing to include a comment in NSLocalizedString macros.
r244389.
New AST matchers have been introduced for constructors that are default, copy, or move. r244036.

Other project commits

The old COFF linker in LLD has been removed in favour of the new, faster, and simpler implementation. r244226.
ThreadSanitizer is now enabled for AArch64. r244055.

LLVM Weekly - #83, Aug 3rd 2015

Mon, 03 Aug 2015 06:21:00 +0000

Welcome to the eighty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The CodeChecker static analysis infrastructure built on Clang Static Analyzer has been released. The slides from the talk at EuroLLVM earlier this year give a good overview.

LLVM/Clang 3.7 RC2 has been tagged. Time to get testing.

The implementation of the Picon Control Flow Integrity protection mechanism has been released. See also the associated paper Picon: Control Flow Integrity on LLVM IR.

On the mailing lists

There is a plan to jump to Windows 7 as the baseline requirement for LLVM, which has been featured in LLVM Weekly previously many months ago. If such a move would cause a problem for you, now is the time to speak up. There's also a proposal to drop support for the old mingw.org toolchains in favour of the maintained mingw-w64.
James Molloy has prototyped LoopEditor, a high-level API for loop transformations in LLVM and is seeking additional feedback. The hope is that existing loop transformations could be rewritten and simplified to use it.
Easwaran Raman has shared an RFC on speedup estimation for inlining cost analysis. The idea is that the estimated speedup (reduction in dynamic instruction count) from performing the inlining should be used as part of the cost metric.
Chris Bieneman has penned a CMake in LLVM roadmap and July CMake update. There's even hope the autoconf build system might be marked deprecated before the 3.8 branch.
Peter Collingbourne has shared a proposal for arbitrary relocations in constant global initialisers.
Try not to panic, but the LLVM mailing lists will be down on August 4th as they will be moving off the UIUC servers. Additionally, SVN access will be read-only and the LLVM bugzilla will be down.
Mehdi Amini has issued a helpful notice for out-of-tree maintainers on the removal of RegisterScheduler::setDefault.
Lang Hames has written a whirlwind introduction to implementing lazy JITting support for a new architecture in Orc.
Michael Schlottke-Lakemper kicked off a discussion on the possibility of using lldb.so to create a stack trace. Respondents pointed out a number of possible choices, including using the llvm::sys::printStackTrace() function.

LLVM commits

A new exception handling representation has been introduced for MSVC compatibility. The commit includes the appropriate updates to the LLVM language reference. r243766.
A test to check bitcode compatibility has been added. This will help ensure the bitcode format produced by an X.Y release is readable by the following X.Z releases. r243779.
The lli documentation has been updated and now better explains its purpose. r243401.
LLVM gained a target-independent thread local storage (TLS) implementation. r243438.
A reverse(ContainerTy) range adapter was added. r243581.

Clang commits

The method for emitting metadata for loop hint pragmas has been modified, using CGLoopInfo. r243315.
Clang learned to pass -Wa,-mfpu, -Wa,-mhwdiv, and -Wa,-mcpu to the integrated assembler. r243353.
Initial support for Open MP 4.1's extended ordered clause was added. r243635.

Other project commits

lldb is starting to gain support for indicating when you are debugging a function that has been optimized. r243508.

LLVM Weekly - #82, Jul 27th 2015

Mon, 27 Jul 2015 03:01:00 +0000

Welcome to the eighty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'd just like to highlight how much I really do appreciate people sending me links for inclusion, e.g. LLVM-related blog posts or new releases of software using LLVM (feature releases rather than simple bugfix updates). I'm not omniescent - if an interesting blog post or software release goes unmentioned here, I probably just didn't know about it!

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The call for papers for the 2015 LLVM Developers' meeting has now gone out. The submission deadline is August 20th. Registration is also now open.

John Regehr and his collaborators working on Souper have shared some initial results from the synthesizing superoptimizer. John is very interested in collecting representative IR from frontends other than Clang. There is also some discussion about these results on the mailing list.

Microsoft have open sourced their GDB/LLDB 'debug engine'.

On the mailing lists

Piotr Padlewski has shared a plan to improve Clang's devirtualization which he will be working on with Richard Smith at Google this Summer. The hope is to reduce any remaining performance gap between LLVM/Clang and GCC (which has better devirtualization support). The Google Doc can be read here.
Rafael EspÃndola has shared some thoughts on handling ELF shared libraries in LLD.
Marshall Clow has kicked off a discussion on how C++ library TSes should be packaged. David Chisnall gives the FreeBSD perspective on ABI and API compatibility in libc++.
Eric Fiselier has posted an RFC on whether libc++ should support the atomic header in C++03.

LLVM commits

dsymutil gained support for one-definition-rule uniquing for C++ code. When linking the DWARF for a debug build of clang, it generates a 150M dwarf file instead of 700M. r242847.
The last remnant of the AliasAnalysis legacy update API have been removed. r242881.
LoopUnswitch can now unswitch multiple trivial conditions in a single pass invocation. r243203.

Clang commits

Clang gained the isFinal() AST matcher. r243107.

Other project commits

A new ELF linker has been born, based on the PE/COFF linker. r243161.
libcxx gained a default searcher for std::experimental::search. r242682.

LLVM Weekly - #81, Jul 20th 2015

Mon, 20 Jul 2015 06:34:00 +0000

Welcome to the eighty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'm "on holiday" (at EuroPython) this week in Bilbao, mostly helping out the Raspberry Pi team with the education track. Do say hello, particularly if you want to chat lowRISC, LLVM, or Raspberry Pi.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLVM 3.6.2 has been released.

LLVM and Clang 3.7 has been branched.

The team behind Pyston, the LLVM-based Python JIT have written a blog post about their new object code caching feature.

On the mailing lists

Teresa Johnson has posted an RFC on ThinLTO symbol linkage and renaming.
Robert Lougher has written up a very detailed analysis of a case of poor register allocation, which may make interesting reading for some.
Chandler Carruth has written up on RFC on stateful alias analysis in LLVM. The plan assumes the new pass manager. GlobalsModRef is the trickiest case to handle, and Chandler has written a separate post about the problems with it.
Hal Finkel is interested in using new functionality introduced in C++11 (such as the final keyword) to improve devirtualization. The proposal generated quite a lot of discussion.
Juergen Ributzka kicked off a discussion on improving the maintenance and management of the LLVM C API. Eric Christopher suggests moving the C API to another project, so those who want/require a stable API can take on the burden of keeping it up to date. He elaborates on his proposal in response from Chris Lattner. There seems to be some support for providing C bindings with the same stability guarantees as the C++ API (i.e. it might break between major releases).
Hal Finkel has proposed an RFC on defining infinite loops in LLVM.

LLVM commits

The API to determine callee-save registers has been rewritten. r242165.
The 'debugger tuning' concept has been introduced, allowing the specification of the debugger the debug info should be optimised for. This defaults to lldb on OS X and FreeBSD and GDB for everything else (other than PS4, which defaults to the SCE debugger). r242388.
Intrinsics for absolute difference operations have been introduced. r242409.
The PostRAScheduler has been disabled for the Apple Swift CPU and MachineScheduler is used in place. The commit message argues PostRAScheduler is not a good fit for out-of-order architectures and suggests the same switch might be worth while for other ARM OoO CPUs. r242500.

Clang commits

Support for armv7-windows-gnu targets has been added to the Clang front-end. r242292.
The clang module container format is now selectable from the command line (raw or obj). r242499.
A minimal AMDGPU toolchain configuration has been added. r242601.

Other project commits

LLD now supports MIPS big-endian targets. r242014.
LLDB's gdbserver is moving towards being a single-threaded application. r242018.
The OpenMP CMake build system has been massively refactored. r242298.

LLVM Weekly - #80, Jul 13th 2015

Mon, 13 Jul 2015 09:50:00 +0000

Welcome to the eightieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The 2015 LLVM Developers' Meeting has been announced. It will take place on October the 29th and 30th in San Jose, California. Registration information and a call for papers will be sent out later in the month.

LLVM/Clang 3.6.2 has been tagged. All being well, we can expect 3.6.2 to be released soon.

On the mailing lists

Daniel Sanders has kicked off an amusingly named conversation on the trouble with triples. The thread gives context for a proposed change to move from ambiguous TargetTriple to unambiguous TargetTuples.
Juergen Ributzka has proposed a new StackMap format. The feedback seems positive so far.
The discussion regarding the analysis of responses to the C as used in practice survey has rumbled on. Chris Lattner has shared some of his thoughts on the freedom undefined behaviour gives the compiler to optimise.
Christos Margiolas has shared patches for his work on heterogeneous execution with LLVM.

LLVM commits

The Hexagon backend gained a BitTracker class. This is intended to be target independent. As described at the top of BitTracker.cpp, this is intended to be used with a target-specific machine instruction evaluator. There have been some other large additions to the Hexagon backend this week too. I hope the authors will consider given another talk on their work at some point.
r241595.
llc learnt the run-pass option, which will run one specific code generation pass only. r241476.
LLVM now has documentation on its inline assembly!
r241698.
The llvm.frameescape and llvm.framerecover intrinsics have been renamed to localescape and localrecover. r241463.
Various refactoring commits have been made with the aim of having a single DataLayout used during compilation, owned by the module.
r241775.
A new llvm.canonicalize intrinsics has been introduced, intended to be used to canonicalize floating point values. r241977.
The new argmemonly attribute can be used to mark functions that can only access memory through its argument pointers. r241979.

Clang commits

A few patches landed in Clang improving Objective-C support. This includes parsing, semantic analysis, and AST support for Objective-C type parameters, support for Objective-C type arguments, the __kindof type qualifier. Douglas Gregor has more to say about these changes on the mailing list. r241541, r241542, r241548, and more.
Clang will attached the readonly or readnone attributes when appropriate to inline assembly instructions, meaning the inline asm will not be treated as conservatively. e.g. in some cases an inline asm block could be hoisted out of a loop. r241930.
PCH (pre-compiled headers) are now wrapped in an object file. r241690, r241620.
Clang now recognises the GCC-compatible -fprofile-generate and -fprofile-use flags. r241825.

Other project commits

libcxx add try_emplace and insert_or_assign to map and unordered_map, as specified in N4279. r241539.
The new LLD COFF linker now has basic support for x86 (it was previously x86-64 only). r241857.

LLVM Weekly - #79, Jul 6th 2015

Mon, 06 Jul 2015 16:23:00 +0000

Welcome to the seventy-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Last week I was in Berkeley for the second RISC-V conference. If you weren't able to make it, worry not because I liveblogged both day one and day two.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Stephen Cross has released llvm-abi, a library for generating LLVM IR that complies with platform ABIs.

This is a rather cute implementation of Tetris in C++ header files, compatible with Clang.

On the mailing lists

Kevin Atkinson asks whether to use MCJIT or ORCJIT. It sounds like ORC is working out well for the LLILC team.
Chandler Carruth has kicked off a discussion about the AliasAnalysis update interface and what should be done about it.
In response to a question, Evgeny Astigeevich has given a useful guide to finding the control dependence graph.
Manuel Klimek is updating the LLVM Phabricator install.

LLVM commits

The initial skeleton of the WebAssembly backend has been committed. It is not yet functional. r241022.
DIModule metadata nodes have been introduced. A DIModule is meant to be used to record modules importaed by the current compile unit. r241017.
New exception handling intrinsics have been added for recovering and restoring parent frames. r241125.

Clang commits

Clang gained support for the x86 builtin __builtin_cpu_supports. r240994.
The Clang man pages have been converted to Sphinx (from .pod). r241037.

Other project commits

libcxx gained shared_mutux. r241067.
LLD has gained some generally applicable optimisations. e.g. devirtualizing SymbolBody and compacting its in-memory representation. r241001.
LLD's COFF linker can now link a working 64-bit debug build of Chrome. chrome.dll takes 24 seconds (vs 48 seconds for linking it with MSVC). r241318.
LLDB grew an example of scripted steps in Python. r241216.

LLVM Weekly - #78, June 29th 2015

Mon, 29 Jun 2015 03:51:00 +0000

Welcome to the seventy-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'm in the Bay Area this week for the second RISC-V workshop where my colleague and I will of course be talking about lowRISC. If you're not able to make it, keep an eye on the lowRISC blog which I intend to keep updating semi-live with notes from the talks and presentations.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Hans Wennborg has shared the release plan for LLVM/Clang 3.7. This would see the release branch created on 14th July, with a final release targeted for 21st August.

A detailed analysis of the results of the "What is C in practice" survey has now been posted. The survey gained around 300 responses, and aims to help guide the definition of a formal model for the de facto standard of C (i.e. C as it is used rather than purely as specified in the ISO standard).

The 3.6.2-rc1 LLVM/Clang release has been tagged. As always, testing is encouraged.

On the mailing lists

Unfortunately at the time of writing GMANE seems to be having some problems, so for this week I'll be using links to the pipermail archives of the relevant mailing list posts.

Adrian Prantl has a proposal for improving the quality of debug locations and of DbgValueHistoryCalculator. This will involve adding new heuristics so DbgValueHistoryCalculator is smarter at creating ranges.
Sanjoy Patel kicked off a discussion about conversion between bitwise AND and short-circuit evaluation for booleans. In particularly, he's wanting to ensure that short circuit evaluation is not used (though as is pointed out in the thread, the benefit of saving the branch is highly microarchitecture dependent).
Dan Liew has posted an RFC on improving the testing of exported LLVM CMake targets. He has produced a toy project making use of these targets, which could be used to help ensure they stay in working shape.
Bjarke Roune is proposing functionality to determine when a posion value is guaranteed to produce undefined behaviour, with the aim of improving handling of nsw/inbounds and similar in LLVM passes such as scalar evolution.
Philip Reames has responded to a question from Chandler Carruth with a good summary of the motivation for supporting inlining through statepoints and patchpoints.

LLVM commits

The InterleavedAccess pass has been introduced to identify interleaved memory accesses so they can be transformed into target-specific intrinsics. r240751.
Initial serialisation of machine instructions has been added, representing MachineInstructions in YAML. r240295, r240425, and more.
The CaptureTracking pass has been optimised to improve performance on very large basic blocks. r240560.
A parser for LLVM stackmap sections has been added and made available through llvm-readobj. r240860.

Clang commits

The recently added nullability attributes have been extensively documented. r240296.
constexpr can now be specified in condition declarations. r240707.

Other project commits

The README for the COFF linker in LLD has been updated with new performance numbers. It's now 3.5 seconds to self-host (was previously 5 seconds), and this compared 7 seconds with the MSVC linker and 30 seconds with the old LLD. r240759.
The safestack TODO list in compiler-rt has been updated. r240473.
LLD gained support for thread-local storage in MachO objects. r240454.
Polly has had a meaningful improvement in compile time through enabling the small integer optimisation of the ISL (Integer Set Library). Polybench benchmarks on average take 20% less time to compile. r240689.

LLVM Weekly - #77, Jun 22nd 2015

Mon, 22 Jun 2015 04:52:00 +0000

Welcome to the seventy-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'll be in California next week for the second RISC-V workshop. Me and my colleague Wei will both be giving talks about recent lowRISC progress. Say hi if you're going to be there. I might have some spare time towards the end of the week too if anyone wants to meet up.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

WebAssembly has been announced. It is a new collaboration between browser vendors to define a new binary executable format that can be used as a compilation target. A good summary is available here on the emscripten mailing list.

Tilmann Scheller has written up a pair of blog posts about improving build times of Clang. He steps through a wide range of generic approaches (using Ninja, ccache, the gold linker, LTO+PGO in the host compiler etc etc) and some specific to Clang/LLVM.

The Cambridge LLVM Social will be taking place on Wed 24th June, 7.30pm at the Blue.

On the mailing lists

Dan Gohman has posted an RFC for the inclusion of a WebAssembly backend in LLVM. It seems like everyone is in favour of the proposed approach.
Yaxun Li has posted a revised RFC on adding a SPIR-V target to LLVM. There still seems to be some push-back on the proposed approach. Chandler Carruth makes an argument that SPIR-V should make use of the existing SelectionDAG legalization layer.
Igor Laevsky is seeking more feedback on adding an attribute to mark that a function only accesses memory through its arguments. Philip Reames point out in the thread that this isn't a new concept to LLVM, except right now such an attribute can only be specified on intrinsics.
Philip Reames is looking for feedback on his plan to implement profile-guided inlining.
Diego Novillo has posted an RFC to enable the -fprofile-generate and -fprofile-use Clang flags. Unsurprisingly, people are in favour of supporting these flags for GCC compatibility.

LLVM commits

Some initial support for 'fault maps' and a FAULTING_LOAD_OP, intended for use in a managed language runtime, has been added. The new ImplicitNullChecks pass will fold null checks into nearby memory operations. r239740, r239743.
The SafeStack pass to protect against stack-based memory corruption errors has been added. r239761.
All temporary symbols are now unnamed. This saves a small amount of memory. r240130.
There's been some enhancement to the heuristics for switch lowering. r240224.

Clang commits

The -fsanitize-trap= flag has been introduced, which will be used to control if the given sanitizer traps upon detecting an error. r240105.
Appropriate bitsets for use by LLVM's control flow integrity implementation can now be emitted for the Microsoft ABI. r240117.
Kernel AddressSanitizer now has basic support. r240131.
Clang learned to recognise type nullability specifiers. r240146.

Other project commits

LLDB learnt how to use hardware watchpoints for MIPS. r239991.
Compression support has been added to LLDB's implementation of the gdb-remote protocol. r240066.

LLVM Weekly - #76, Jun 15th 2015

Mon, 15 Jun 2015 09:49:00 +0000

Welcome to the seventy-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The big news this week is that Apple have announced Swift 2.0 and, perhaps more importantly, that Swift will be open source later this year. The intention is that iOS, OS X and Linux will be supported at release.

On the mailing lists

Quentin Colombet has posted a helpful clarification on the restrictions of a MachineFunctionPass.
Getting frustrated maintaining your out-of-tree LLVM backend? It's comforting for me at least to know things could be worse. Patrik HÃ¤gglund reports he and his colleague maintain patches to support 16-bit bytes and 24/40-bit MachineValueTypes.
Chandler Carruth is planning some AliasAnalysis refactoring for the new pass manager. He's asking for feedback the general plan as well as the all-important question of naming conventions.
Zia Ansari has shared an interesting summary of an investigation of performance swings on Intel architectures related to the post-decode micro-op cache. A PPTX-formatted summary is available here.

LLVM commits

The loop vectorizer gained an optimisation for interleaved memory access. It is disabled by default but can be turned on using -enable-interleaved-mem-accesses=true. An AArch64InterleavedAccess pass was also added. r239291, r239514.
A prototype for 32-bit SEH (Structured Exception Handling) has been added. r239433.
LLVM has grown LibDriver and llvm-lib, intended to provide a lib.exe compatible utility. r239434.
x86 gained a new reassociation MachineCombiner optimisation to increase ILP. r239486.
The R600 backend has now been renamed to AMDGPU. r239657.

Clang commits

Support for C99 partial re-initialization behaviour has been implemented. r239446.
Clang gained support for the BPF backend. r239496.
The loop vectorize pragma now recognises assume_safety. This will tell loop access analysis to skip memory dependency checking. r239572.
The target attribute is now supported. Much like GCC's target attribute, it allows adding subtarget features and changing the CPU for a particular function. r239579.

Other project commits

The COFF linker in LLD continues to get faster. r239332, r239292.
LLD grew a TypeSystem interface to support adding non-clang languages (though it seems it's reverted for now). r239360.

LLVM Weekly - #75, Jun 8th 2015

Mon, 08 Jun 2015 04:40:00 +0000

Welcome to the seventy-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Botond Ballo has posted a wonderfully thorough summary of the recent Lenexa C++ standards meeting, even including a table to summarise the status of various major proposals.

I have somehow neglected to mention the Crystal language previously. It is a statically typed language with syntax inspired by Ruby which (of course) compiles using LLVM. It was discussed last week on Hacker News.

icGrep has been released. It makes use of the 'Parabix' text representation and LLVM for high performance regex matching. More details are available at the icGrep homepage.

The winners of the 7th Underhanded C Contest have now been announced online. Congratulations to the winner, Karen Pease, for creating such a monstrous piece of code.

On the mailing lists

Chandler Carruth has posted a summary of a recent in-person discussion about LLD's future and design. It looks like this was a very positive meeting with agreement in important areas. The recently contributed experimental COFF linker is going to be evaluated to see if its linking model would be appropriate for Darwin. If so, the hope is work can focus on adopting that as the standard model. If not, more work will need to be done on refactoring LLD and making sure that code which makes sense to be shared is.
Christos Margiolas has been working as an intern at the Qualcomm Innovation Center on support for heterogeneous compute, including transparent offloading of loops or functions to accelerators. He is asking for feedback and looking to see if there is interest in getting this upstream. He has shared a slide deck which gives more details.
Woodrow Barlow is interested in implementing a new PIC backend for LLVM. Renato Golin gave a very thorough and helpful response about how you might proceed.
Frank Winter is looking for a way to replace a sequence of repetitive code with a loop. It was pointed out that the LLVM loop reroll pass should be helpful for this, but it does need to run on an existing loop. This would mean it requires modification or the IR should be modified to introduce a trivial loop before running the reroll pass.
Philip Reames has posted an RFC on adding a liveoncall parameter attribute. This would be used to leave an argument marked as live even if it isn't actually used (so it might be later inspected at runtime). Chris Lattner queried whether adding an intrinsic might be a better approach.

LLVM commits

LLVM gained support for the new AArch64 v8.1a atomic instructions. r238818.
The MPX (Intel Memory Protection eXtensions) feature bit and bound registers are now supported on the X86 backend. r238916.
MIPS FastISel gained more instruction and intrinsic implementations. r238756, r238757, r238759.
With the introduction of MCSymbolELF, the base MCSymbol size is now reduced to 48 bytes on x86-64. r238801.
Work has started on porting AliasAnalysis to the new pass manager. r239003.
The BPF backend now supports big and host endian, in addition to the previously supported little endian. r239071.
The naming and structure of the recently added unroll heuristics has been modified. r239164.

Clang commits

-mcpu for ARM will now ignore the case of its arguments for ARM. r239059.
A mass of predefined vector functions for PowerPC has been added. r239066.
The concept and requires keywords (as used in the C++ Concepts TS) are now lexed. Let's hope this starting point is followed up with work towards full concepts support in the coming months. r239128.

Other project commits

The lld COFF linker gained an initial implementation of link-time optimisation. r238777.
LLDB gained support for software emulation of the MIPS64 branch instructions. r238820.
libiomp5 is now libomp. r238712.

LLVM Weekly - #74, Jun 1st 2015

Mon, 01 Jun 2015 06:35:00 +0000

Welcome to the seventy-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

You may be interested in the second RISC-V workshop, which will be held in Berkeley June 29-30. Early bird registration ends today, but academics can register for free. My colleague Wei and I will be there representing lowRISC.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

MPI-Checker, a static analysis tool for MPI code has been released. It is of course implemented using Clang's Static Analyzer framework.

The LLVM-HPC2 workshop will be held on November 15th, co-located with SC15. The call for papers has been published. Submissions are due by September 1st.

On the mailing lists

The discussion about improving LLD has resumed. Rui Ueyama has written about his recent patch for a section-based PE/COFF linker. Sean Silva argues that refactoring might be a better approach.
Hans Wennborg has shared a preliminary release plan for LLVM/Clang 3.7. This would see the codebase branched on the 14th of July with a release target of the end of August. Tom Stellard has also shared a schedule for the 3.6.2 release. The deadline for patches is June 15 with the release date targeted for June 29th.
Teresa Johnson has posted an updated RFC of her ThinLTO implementation plan. There are still concerns about the wrapping of LLVM bitcode in ELF. Alex Rosenberg suggests implementing an objcopy replacement for LLVM and extending bitcode as necessary as an alternative approach.
Chris Bieneman has posted a status update on the quest to replace the autoconf build system with CMake.
Quentin Colombet is seeking benchmark reports (or indeed bugs!) for the recently added 'shrink wrapping'.
Matthias Braun has left a quick note for maintainers of out-of-tree targets that pristine register semantics have been modified slightly.
Chandler Carruth has posted some thoughts on next steps for loop unrolling analysis.

LLVM commits

Codegen for memcpy on Thumb2 has been improved to make use of load/store multiple (more details of how this works in the commit message, worth a read for those interested). r238473.
Popcount on x86 will now be implemented using a lookup table in register technique. r238636, r238652.
Work continues on reducing peak memory usage through optimisations to debug info. r238364.
Initial support for the convergent attribute has landed. r238264.
The documentation about the current state of LLVM's Phabricator has been updated along with a call for volunteers to help develop necessary improvements and modifications to Phabricator's PHP codebase. r238295.
MCJIT gained support for MIPS64r2 and MIPS64r6. r238424.

Clang commits

Clang's handling of -fopenmp has been rewritten. r238389.
The user documentation on profiling has been extended. r238504.

Other project commits

A new PE/COFF section-based linker has been added to lld. This follows on from discussions about the direction of lld and whether it makes sense to build on top of the atom model. The linker is able to self-link on Windows and is significantly faster than the current implementations (1.2 seconds vs 5 seconds, even without multi-threading). It also takes only 250MB of RAM to self-link vs 2GB. r238458.
LLDB on Windows can now demangle Linux or Android symbols. r238460.

LLVM Weekly - #73, May 25th 2015

Mon, 25 May 2015 09:39:00 +0000

Welcome to the seventy-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The LLVM blog has properly announced full support for OpenMP 3.1 in Clang.

The Clang-derived Zapcc has had some attention this week. It claims higher compilation speeds than the baseline Clang or other compilers. Yaron Keren, the principal developer has shared many more details about its implementation on the Clang mailing list.

On the mailing lists

The discussion about upstreaming the LLVM/SPIR-V converter has continued. Chandler Carruth has responded with feedback, and Philip Reames has shared his concerns about the merge proposal as-is. Neil Henning has responded to some of these concerns.
Adam Nemet has kicked off a thread about alias-based loop versioning, with the hope that others working in the area can chime in.
FÃ©lix Cloutier queries why MemoryDependencyAnalysis reports dependencies between NoAlias pointers. Daniel Berlin points to his very interesting looking work on MemorySSA.
Duncan P.N. Exon Smith has posted an RFC on reducing the memory footprint of debug info entries. The attached patches reduce peak memory usage from 920MB to 884MB for the tested workload.
John Criswell has a helpful answer regarding how to determine whether a branch instruction may depend on function parameters.
Andrew Kaylor has shared a detailed description of the work to be done for exception handling on Windows.
Andrew Bokhanko is looking for feedback on adding an option to control a level of OpenMP support in Clang. Now that 3.1 support is complete, OpenMP 4.0 is the next target but this is likely to remain incomplete for some time. The question is whether those features which are implemented are available by default, or whether users should opt-in with a compiler flag while support remains incomplete.

LLVM commits

The dereferenceable_or_null attribute will now be exploited by the loop environment code motion pass. r237593.
Commits have started on the 'MIR serialization' project, which aims to print machine functions in a readable format. r237954.
A GCStrategy for CoreCLR has been committed alongside some documentation for it. r237753, r237869.
libFuzzer gained some more documentation. r237836.
libFuzzer can now be used with user-supplied mutators. r238059, r238062.

Clang commits

-fopenmp will turn on OpenMP support and link with libiomp5 (libgomp can alternatively be specified). r237769.
The -mrecip flag has been added to match GCC. r238055.

Other project commits

C++1z status for libcxx has been updated. r237606.
std::bool_constant and uninitialized_copy() was added to libcxx. r237636, r237699.
libcxx gained a TODO list. Plenty of tasks that might be interesting to new contributors. r237813, r237988.
LDB has enabled debugging of multithreaded programs on Windows and gained support for attaching to process. r237637, r237817.

OpenMP Support

Fri, 22 May 2015 08:23:00 +0000

OpenMP support in Clang compiler is completed! Every pragma and clause from 3.1 version of the standard is supported in full, including combined directives (like â€˜#pragma omp parallel forâ€™ and â€˜#pragma omp parallel sectionsâ€™). In addition, some elements of OpenMP 4.0 are supported as well. This includes â€œalmost completeâ€ support for â€˜#pragma omp simdâ€ and full support for â€˜#pragma omp atomicâ€™ (combined pragmas and a couple of clauses are still missing).

OpenMP enables Clang users to harness full power of modern multi-core processors with vector units. Pragmas from OpenMP 3.1 provide an industry standard way to employ task parallelism, while â€˜#pragma omp simdâ€™ is a simple yet flexible way to enable data parallelism (aka vectorization).

Clang implementation of OpenMP standard relies on LLVM OpenMP runtime library, available at http://openmp.llvm.org/. This runtime supports ARMÂ® architecture processors, PowerPCâ„¢ processors, 32 and 64 bit X86 processors and provides ABI compatibility with GCC and Intel's existing OpenMP compilers.

To enable OpenMP, just add â€˜-fopenmpâ€™ to the command line and provide paths to OpenMP headers and library with â€˜-I <path to omp.h> -L <LLVM OpenMP library path>â€™.

To run a compiled program you may need to provide a path to shared OpenMP library as well:

$ export LD_LIBRARY_PATH=<OpenMP library path>:$LD_LIBRARY_PATH

or:

$ export DYLD_LIBRARY_PATH=<OpenMP library path>:$DYLD_LIBRARY_PATH

on Mac OS X.

You can confirm that the compiler works correctly by trying this simple parallel C program:

#include <omp.h>

#include <stdio.h>

int main() {

#pragma omp parallel

printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());

}

Compile it (you should see no errors or warnings):

$ clang -fopenmp -I <path to omp.h> -L <LLVM OpenMP library path> hello_openmp.c -o hello_openmp

and execute:

$ export [DY]LD_LIBRARY_PATH=<OpenMP library path>:$[DY]LD_LIBRARY_PATH

$ ./hello_openmp

You will see more than one â€œHelloâ€ line with different thread numbers (note that the lines may be mixed together). If you see only one line, try setting the environment variable OMP_NUM_THREADS to some number (say 4) and try again.

Hopefully, you will enjoy using OpenMP and witness dramatic boosts of your applicationsâ€™ performance!

LLVM Weekly - #72, May 18th 2015

Mon, 18 May 2015 05:56:00 +0000

Welcome to the seventy-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Some of you may be interested that over at the lowRISC project, we've announced the full set of summer student projects we're supporting.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The Rust programming language, which of course uses LLVM as its compiler backend, has just released version 1.0.

The next Cambridge LLVM Social will take place on Wednesday 20th May at the Cambridge Beer Festival.

On the mailing lists

Teresa Johnson has posted an RFC on her ThinLTO implementation plan. It's seen a lot of feedback, too much for me to hope to summarise, though much of it was around emitting bitcode wrapped in ELF.
Yaxun Liu is looking to upstream the Khronos Group LLM to SPIR-V converter. There was some discussion as to whether SPIR-V makes most sense as a backend or as serialisation format alongside the LLVM bitcode output.
Reid Kleckner has written up an RFC on a new exception handling representation for MSVC, given that the Itanium EH representation has been found insufficient.
Chris Matthews proposes changing LNT's regression detection algorithm, with the aim being to reduce the number of false positives to the extent that people will be motivated to investigate reported regressions.
Matt Arsenault is proposing upstreaming an LLVM backend for HSAIL. HSAIL presents a virtual target machine (in a way similar to NVPTX), and is defined by the HSA Foundation.
Owen Anderson proposes a new convergent attribute aiming to make LLVM more suitable for SPMD/SIMT programming models. So far, all feedback has been very positive.

LLVM commits

The ARM backend has been updated to use AEABI aligned function variants. r237127.
The heuristic for estimating the effect of complete loop unrolling has been reimplemented. r237156.
Statepoints are now 'patchable'. r237214.
Support for function entry count metadata has been added. r237260.
A new loop distribution pass has been born. It is off by default. r237358.
New SelectionDAG nodes have been added for signed/unsigned min/max. r237423.
A simple speculative execution pass, targeted mainly at GPUs has been added. r237459.

Clang commits

The little endian SPARC target has been added to clang. r237001.
clang-format's formatter has undergone some refactoring, which also led to a few bug fixes.r237104.
Documentation on adding new attributes has seen a significant update. r237268.

Other project commits

libcxx learnt std::experimental::sample r237264.
lldb has enabled multithreaded debugging on Windows. r237392.
lldb can now set and clear hardware watchpoints and breakpoints on AArch64. r237419.
lldb gained an assembly profiler for mips32. r237420.

LLVM Weekly - #71, May 11th 2015

Mon, 11 May 2015 02:42:00 +0000

Welcome to the seventy-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The implementation of OpenMP 3.1 in Clang/LLVM is now complete. Well done to everyone involved.

Most slides from the presentations at EuroLLVM 2015 are now online. Video is coming soon.

Version 3.5.1 of Clang UPC, the Unified Parallel C compiler has been released. The main change seems to be the move to Clang/LLVM 3.5.

The Pony Language, which features an LLVM backend has recently been released. It received quite a lot of discussion on Hacker News.

Many readers might be interested in this update from the last C++ standardization committee meeting.

IBM have posted some bounties for TSAN support and MSAN support for PPC64.

On the mailing lists

Renato Golin has been asking about interest in improving the LLVM online code coverage report. Joshua Cranmer shared the work he did on code coverage for Thunderbird and Firefox.
John Criswell has followed up on an older thread about the LLVM DSA work giving some useful insight, and some more.
Hubert Tong is interested in working on implementing the C++ Concepts Technical Specification in Clang, and would like anyone who's interested in collaborating or has already made a start to get in touch.
Quentin Colombet has posted a heads-up about his shrink-wrap pass work, including details on how to enable support in your backend (in-tree or out-of-tree).
Last week's discussion about improving LLD has rumbled on. Chris Lattner suggested working on two linkers, one to serve the needs of those who primarily want a usable BSD-licensed system linker and another a 'next generation' linker trying to meet the original aims of LLD, developed without the same constraints on compatibility. Alex Rosenberg gave a good summary of the original aims of LLD and how recent changes have moved it further from those aims. It looks like a path forwards is being identified.

LLVM commits

A new 'shrink-wrap' pass has been added. It attempts to insert the prologue and epilogue of a function somewhere other than the entry/exit blocks. See the commit message for a motivating example. r236507.
Support for the z13 processor and its vector capabilities have been added to the SystemZ backend. r236520, r236521.
Documentation has been written for the new masked gather and scatter intrinsics. r236721.
The statepoint intrinsic has been extended to allow statepoints to be marked as tranditions from GC-aware code to nonGC-aware code. r236888.

Clang commits

Clang support for the z13 processor was added. r236531.
Thread-safe initialization using the MSVC 2015 ABI has been implemented. r236697.
User-friendly -fsanitize-coverage= flags are now available. r236790.

Other project commits

libiomp's CMake has been integrated into the LLVM CMake build system, so you can now checkout libiomp and have it built alongside llvm, clang and so on. r236534.

LLVM Weekly - #70, May 4th 2015

Mon, 04 May 2015 05:33:00 +0000

Welcome to the seventieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Microsoft have announced their intention to make use of the Clang frontend on Windows.

Bjarne Stroustrup has recently been talking about potential C++17 features.

The Visual C++ developers are going to be open-sourcing their GDB/LLDB debug engine.

The projects accepted into Google Summer of Code for LLVM have been announced. Four student projects have been accepted.

The next Bay Area LLVM social is scheduled for 7pm on Thursday the 7th of May. Please sign up if you are attending.

On the mailing lists

Rui Ueyama has been doing quite a lot of work on LLD of late and has proposed an LLD improvement plan. In it, he proposes some major changes that would hopefully ease the path to LLD becoming a fully functional ELF, Mach-O, and PE-COFF linker. The two main proposals are to use the 'section' rather than the 'atom' model and to stop trying to bend the Unix model to work on other platforms, instead directly implementing the necessary native behaviour. There are understandably some concerns that this direction could result in LLD having to maintain essentially three linkers, but discussion is ongoing and much feedback seems positive.
Alex, who will be interning at Apple this summer has posted an RFC on a proposed machine level IR text serialisation format. It came out a little mangled on Gmane so you may prefer to read the pipermail rendering. A lot of the feedback revolves around the pros and cons of a YAML-based format..
Andrey Bokhanko suggests replacing libgomp with libiomp as the default OpenMP runtime library when using -fopenmp. Ultimately there seems to be agreement and the only issue seems to be on the library naming.
Nico Weber reports that although -gline-tables-only makes debug info much smaller, they've found with Chromium the resulting stackframes aren't that usable without function parameters and namespaces. The proposal is to add a new variant that does include function parameter info and namespace info.

LLVM commits

The LLVM performance tips document has gained another two entries. r235825, r235826.
llvm-symbolizer now works on Windows. r235900.
SelectionDAG, DAGCombiner and codegen support for masked scatter and gather has been added. r235970, r236211, r236394.
Debug locations have been added to all constant SelectionDAG nodes. r235989.
Dragonegg support has been dropped from the release script. r236077.
The debug info IR constructs have been renamed from MD* to DI*. Duncan suggests that if you're updating an out of tree target, it may be easiest to first get things compiling with the code from before this commit, then continue the merge. r236120.

Clang commits

Clang can now generate dependencies in the style accepted by the NMake and Jom build tools. r235903.
New AVX-512 intrinsics have been added. r235986, r236218.
Clang learned -Wpessimizing-move and -Wredundant-move warnings. r236075.

Other project commits

LLDB gained support for the SysV ABI on ARM and AArch64. r236097, r236098.
The LLVM test suite gained a frame_layout test. r236085.

LLVM Weekly - #69, Apr 27th 2015

Mon, 27 Apr 2015 05:41:00 +0000

Welcome to the sixty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Ed Jones at Embecosm has written about his work to use the GCC regression suite on Clang and documented how to run the LLVM test suites on an embedded target.

GCC 5.1 has now been released. Congratulations to the GCC team. The versioning scheme is potentially confusing - 5.0 is the development version which saw a stable release as 5.1. The next minor releases will be 5.2, 5.3 etc and the next major release will be 6.1 (referred to as 6.0 during development).

On the mailing lists

Sanjoy Das has posted an RFC on supporting implicit null checks in LLVM. This is intended to support managed languages like Java, C#, or Go where a null check is required before using pointers.
Alex L interned at Apple last year, and is interning again this summer. He's posted to the list about his project, which is to develop a text-based human readable format that allows LLVM to serialize the machine-level IR. The intention is to make debug and testing easier. He welcomes any feedback or suggestions.
libunwind is moving to its own repository. Hopefully a git mirror will go live soon.
Roel Jordans gave a talk at EuroLLVM this year about his software pipelining pass. He has posted to the mailing list to give a few more details and share his source code.
Tom Stellard is looking to increase the number of code owners, i.e. the set of people who review patches or approve merge requests to stable branches on a certain part of the code. His plan is to start nominating new code owners based on git history whenever he gets a new stable merge request for an unowned component.

LLVM commits

Functions can now have metadata attachments. r235785.
The stack object allocation for Hexagon has been completely overhauled. r235521.
The vim support files have been updated. Changes include a new indentation plugin for .ll files. r235369.
llvm-link learned the -override option to override linkage rules. r235473.
There is now textual IR support for an explicit type parameter to the invoke instruction (much like for the call instruction). r235755.

Clang commits

Documentation has been added for SanitizerCoverage (simple code coverage using the Sanitizer tools). r235643.
Clang's __attribute__((aligned)) can now set alignment to a target-specific value, rather than just assuming 16 bytes on all platforms. r235397.

Other project commits

lld now understands --discard-locals. r235357.
lldb's 'operation' and 'monitor' threads on Linux have been merged in to one. r235304.
It's now possible to build compiler-rt for MIPS with gcc. r235299.
libunwind seems to have been moved into its own project. r235759.

LLVM Weekly - #68, Apr 20th 2015

Tue, 21 Apr 2015 01:52:00 +0000

Welcome to the sixty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Last week was of course EuroLLVM. It was good to put faces to names for a number of you, or to meet people I've only communicated with over twitter. Slides and videos should be forthcoming, but in the meantime you can read the liveblog I maintained for the talks I was able to attend. A big thank you to the organisers and all those who presented.

The highest profile news for some time in the LLVM community is that Microsoft are working on an LLVM-based compiler for .NET CoreCLR. What's more, LLILC (pronounced 'lilac') is open source, and they hope to contribute their LLVM changes upstream.

The Cerberus team are trying to find an answer to the question 'What is C in practice?, and you can help by filling out their survey.

Honza HubiÄka has posted a fantastic overview of the improvements to link-time and inter-procedural optimisations in GCC5, featuring figures from Firefox builds. It seems like impressive improvements have been made.

Simon Cook has written a blog post about using TableGen outside of LLVM, specifically for things like parameterisable SSH configs. Crazy? Genius? Why not both?

On the mailing lists

Duncan P. N. Exon Smith has proposed an RFC for metadata attachments to function definitions. There's some concern about the effect this would have on the size of Function.
Ivan Baev is proposing an indirect call promotion LLVM pass. This mailing list post gives design notes on the implementation and outlines the shortcomings of the current version of the patch.
Tom Stellard queries the difference between Latency and ResourceCycles in MISched. Andy Trick gives a very handy answer.
Paul Robinson is seeking feedback on adding a 'debugger target', arguing that currently the target platform is used to infer the desired debugger (and thus make choices on e.g. accelerator tables), but of course the choice of debugger doesn't necessarily follow from the platform.
Colin LeMahieu is looking to contribute an assembler for the Hexagon DSP, but has questions about how its funky grammar should be handled.

LLVM commits

The dereferenceable_or_null attribute has been born. As described in the commit message, the motivation is for managed languages such as Java. r235132.
A new layer was added to the Orc JIT API for applying arbitrary transforms to IR. r234805.
Memory intrinsics can now be tail calls. r234764.
A handy Python script has been added for generating C programs that have a control flow graph which is a ladder graph. The intent is to use it to test whether an algorithm expected to be linear time on the CFG really is. r234917.
The code for lowering switches and extracting jump tables has been rewritten, and should produce better results now. r235101.
Call can now take an explicit type parameter. r235145.

Clang commits

Clang learned -Wrange-loop-analysis, which will warn when a range-based for loop makes copies of elements in the range. r234804.
The preserve-bc-uselistorder option is no longer true by default, but Clang will set it for -emit-llvm or -save-temps. r234920.
LLVM has had a lot of changes to the debug API in the last week. This commit updates Clang for them. r235112.

Other project commits

Reducing the amount of template use in LLD has reduced the size of AArch64TargetHandler.cpp.o to 4.1MB, down from 21MB. r234808, r234931.
A large patchset has landed in lldb which adds a big chunk of the work necessary for supporting lldb on ARM. r234870.

Tue, 14 Apr 2015 07:03:00 +0000

LLILC : An LLVM based compiler for dotnet CoreCLR.

The LLILC project (we pronounce it "lilac") is a new effort started at Microsoft to produce MSIL code generators based on LLVM and targeting the open source dotnet CoreCLR. We are envisioning using the LLVM infrastructure for a number of scenarios, but our first tool is a Just in Time (JIT) compiler for CoreCLR. This new project is being developed on GitHub and you can check it out here. The rest of this post outlines the rational and goals for the project as well as our experience using LLVM.

Why a new JIT for CoreCLR?

While the CoreCLR already has JIT, we saw an opportunity to provide a new code generator that has the potential to run across all the targets and platforms supported by LLVM. To enable this, as part of our project we're opening an MSIL reader that operates directly against the same common JIT interface as the production JIT (RyuJIT). This new JIT will allow any C# program written for the .NET Core class libraries to run on any platform that CoreCLR can be ported to and that LLVM will target.

There are several ongoing efforts to compile MSIL in the LLVM community, SharpLang springs to mind. Why build another one?

When we started thinking about the fastest way to get a LLVM based code generation working we looked around at the current open source projects as well as the code we had internally. While a number of the OSS projects already targeted LLVM BitCode, no one had anything that was a close match for the CoreCLR interface. Looking at our options it was simplest for us to refactor a working MSIL reader to target BitCode then teach a existing project to support the contracts and APIs the CoreCLR uses for JITing MSIL. Using a existing MSIL reader let us quickly start using a number of building-block components that we think the community can take advantage of. This fast bootstrap for C# across multiple platforms was the idea that was the genesis of this project and the compelling reason to start a new effort. We hope LLILC will provide a useful example - and reusable components - for the community and make it easier for other projects to interoperate with the CoreCLR runtime.

Why LLVM?

Basically we think LLVM is awesome. It's already got great support across many platforms and chipsets and the community is amazingly active. When we started getting involved, just trying to stay current with the developer mailing list was a revelation! The ability for LLVM to operate as both a JIT and as an AOT compiler was especially attractive. By bringing MSIL semantics to LLVM we plan to construct a number of tools that can work against CoreCLR or some sub set of its components. We also hope the community will produce tools what we haven't thought of yet.

Tool roadmap

CoreCLR JIT
- Just In Time - A classic JIT. This is expected to be throughput-challenged but will be correct and usable for bringup. Also possible to use with more optimization enabled as a higher tier JIT
- Install-time JIT - What .NET calls NGen. This will be suitable for install-time JITing (LLVM is still slow in a runtime configuration)
Ahead of Time compiler. A build lab compiler that produces standalone executables, using some shared components from CoreCLR. The AOT compiler will be used to improve startup time for important command line applications like the Roslyn Compiler.

The LLIC JIT will be a functionally correct and complete JIT for the CoreCLR runtime. It may not have sufficient throughput to be a first-tier jit, but is expected to produce high-quality code and so might make a very interesting second-tier or later JIT, or a good vehicle for prototyping codegen changes to feed back into RyuJIT.

What's Actually Working

Today on Windows we have the MSIL reader & LLVM JIT implemented well enough to compile a significant number of methods in the JIT bring up tests included in CoreCLR. In these tests we compile about 90% the methods and then fall back to RyuJIT for cases we can't handle yet. The testing experience is pretty decent for developers. The tests we run can be seen in the CoreCLR test repo.

We've establish builds on Linux and Mac OSX and are pulling together mscorlib, the base .NET Core library from CoreFx, and test asset dependencies to get testing off-the-ground for those platforms.

All tests run against the CoreCLR GC in conservative mode - which scans the frame for roots - rather than precise mode. We don't yet support Exception Handling.

Architecture

Philosophically LLILC is intended to provide a lean interface between CoreCLR and LLVM. Where possible we rely on preexisting technology.

For the JIT, when we are compiling on demand, we map the runtime types and MSIL into LLVM BitCode. From there compilation uses LLVM MCJIT infrastructure to produce compiled code that is output to buffers provided by CoreCLR.

Our AOT diagram is more tentative but the basic outline is that the code generator is driven using the same interface and the JIT but that there is a static type system behind it and we build up a whole program module with in LLVM and run in a LTO like mode. Required runtime components are emitted with the output objs and the platform linker then produces the target executable. There are still a number of open questions around issues like generics that need resolution but this is our first stake in the ground.

Experience with LLVM

In the few months we've been using LLVM, we've had a really good experience but with a few caveats. Getting started with translating to BitCode has been a very straightforward experience and ramp-up time for someone with compiler experience has been very quick. The MCJIT, which we started with for our JIT, was easy to configure and get code compiled and returned to the runtime. Outside of the COFF issue discussed below, we only had to make adjustments in configuration or straightforward overrides of classes, like EEMemoryManager, to enable working code. Of the caveats, the first was simple, but the other two are going to require sustained work to bring up to the level we'd like. The first issue was a problem with Windows support in the DynamicRuntime of the MCJIT infrastructure. The last two, Precise Garbage Collection, and Exception Handling, arise because of the different semantics of managed languages. Luckily for us, people in the community have already started working in these areas so we don't have to start from zero.

COFF dynamic loader support

One of the additions we made to LLVM to unblock ourselves was to implement a COFF dynamic loader. (The patch to add RuntimeDyldCoff.{h,cpp} is through review and has been committed). This was the only addition we directly had to make to LLVM to enable bring-up of the code generator. Once this is in, we see a number of bugs in the database around Windows JIT support that should be easier to close.

Precise Garbage Collection

Precise GC is central to the CoreCLR approach to memory management. Its intent is to keep the overhead of managed memory as low as possible. It is assumed by the runtime that the JIT will generate precise information about the GC ref lifetimes and provide it with the compiled code for execution. To support this we're beginning to use the StatePoint approach, with additions to convert the standard output format to the custom format expected by CoreCLR. We share some of the same concerns that Philip Reames wrote about in the initial design of StatePoints. E.g. preservation of "GCness" through the optimizer is critical, but must not block optimizer transformations. Given this concern one of our open questions is how to enable testing to find GC holes that creep in, but also enable extra checking that can be opted into if the incoming IR contains GC pointers.

There is a more detailed document included in our repo that outlines our more-specific GC plans here.

Exception Handling

The MSIL EH model is specific to the CLR as you'd expect, but it descends in part conceptually from Windows Structured Exception Handling (SEH). In particular, the implicit exception flow from memory accesses to implement null checks, and the use of filters and funclets in the handling of exceptions, mirrors SEH (here. is an outline of C# EH) Our plans at this point are to add all checks required by MSIL as explicit compare/branch/throw sequences to better match C++ EH as well as building on the SEH support currently being put into Clang. Then, once we have correctness, see if there is a reasonable way forward that improves performance.

Like GC, there's a detailed doc outlining our specific issues and plans in the repo here

Future Work

More platforms. Today we're running on Windows and starting to build for Linux and Mac OSX. We'd like more.
Complete JIT implementation
- More MSIL opcodes supported
- Precise GC support
- EH support
Specialized memory allocators for hosted solutions. CoreCLR has been used as a hosted solution (run in process by other programs) but to support this we need a better memory allocation story. The runtime should be able to provide a memory allocator that is used for all compilation.
AOT - Fully flesh out the AOT story.

Links

LLVM Weekly - #67, Apr 13th 2015

Mon, 13 Apr 2015 05:25:00 +0000

Welcome to the sixty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

EuroLLVM is going on today and tomorrow in London. I hope to see a number of you there. Provided there's a reasonable internet connection, I hope to be live-blogging the event on the llvmweekly.org version of this issue.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

A new post on the LLVM Blog deatils how to use LLVM's libFuzzer for guided fuzzing of libraries.

The Red Hat developer blog has an article about libgccjit, a new feature in GCC5, which may be of interest.

On the mailing lists

Rui Ueyama proposes removing the 'native' file format from LLD. The hope was the native file format could be shared between LLD and LLVM and provide higher performance than standard ELF. In the end, it didn't see much development so it's being deleted for now.
Hal Finkel has some questions for compiler developers on optimisations of atomics. Answers will be fed back to the OpenMP standards committee, who are working to formalize their memory model and define its relationship to the C/C++ memory models.
The document about a proposed OpenMP offload infrastructure has been updated. Comments and feedback are very welcome.
Tom Stellard would like to remind you that bug fixes for the upcoming 3.6.1 release must be merged by the 4th of May.
Sanjoy Das is seeking some clarification on the semantics of shl nsw in the LLVM language reference. It seems that Sanjoy and David Majnemer are reaching an agreement in the thread, but they welcome differing viewpoints.

LLVM commits

The R600 backend gained an experimental integrated assembler. r234381.
The libFuzzer documentation has been extended to demonstrate how the Heartbleed vulnerability could have been found using it. r234391.
The preserve-use-list-order flags are now on by default. r234510.
LLVM gained a pass to estimate when branches in a GPU program can diverge. r234567.
The ARM backend learnt to recognise the Cortex-R4 processor. r234486.

Clang commits

Lifetime markers for named temporaries are now always inserted. r234581.
The quality of error messages for assignments to read-only variables has been enhanced. r234677.
clang-format's nested block formatting got a little better. r234304.

Other project commits

Support for the 'native' file format was removed from lld. r234641.
Remote debugging, the remote test suite, and the process to cross-compile lldb has been documented. r234317, r234395, r234489.
LLDB gained initial runtime support for RenderScript. r234503.

Simple guided fuzzing for libraries using LLVM's new libFuzzer

Thu, 09 Apr 2015 13:40:00 +0000

Fuzzing (or fuzz testing) is becoming increasingly popular. Fuzzing Clang and fuzzing with Clang is not new: Clang-based AddressSanitizer has been used for fuzz-testing the Chrome browser for several years and Clang itself has been extensively fuzzed using csmith and, more recently, using AFL. Now weâ€™ve closed the loop and started to fuzz parts of LLVM (including Clang) using LLVM itself.

LibFuzzer, recently added to the LLVM tree, is a library for in-process fuzzing that uses Sanitizer Coverage instrumentation to guide test generation. With LibFuzzer one can implement a guided fuzzer for some library by writing one simple function:

extern "C" void TestOneInput(const uint8_t *Data, size_t Size);

We have implemented two fuzzers on top of LibFuzzer: clang-format-fuzzer and clang-fuzzer. Clang-format is mostly a lexical analyzer, so giving it random bytes to format worked perfectly and discovered over 20 bugs. Clang however is more than just a lexer and giving it random bytes barely scratches the surface, so in addition to testing with random bytes we also fuzzed Clang in token-aware mode. Both modes found bugs; some of them were previously detected by AFL, some others were not: weâ€™ve run this fuzzer with AddressSanitizer and some of the bugs are not easily discoverable without it.

Just to give you the feeling, here are some of the input samples discovered by the token-aware clang-fuzzer starting from an empty test corpus:

static void g(){}

signed*Qwchar_t;

overridedouble++!=~;inline-=}y=^bitand{;*=or;goto*&&k}==n

int XS/=~char16_t&s<=const_cast<Xchar*>(thread_local3+=char32_t

Fuzzing is not a one-off thing -- it shines when used continuously. We have set up a public build bot that runs clang-fuzzer and clang-format-fuzzer 24/7. This way, the fuzzers keep improving the test corpus and will periodically find old bugs or fresh regressions (the bot has caught at least one such regression already).

The benefit of in-process fuzzing compared to out-of-process is that you can test more inputs per second. This is also a weakness: you can not effectively limit the execution time for every input. If some of the inputs trigger superlinear behavior, it may slow down or paralyze the fuzzing. Our fuzzing bot was nearly dead after it discovered exponential parsing time in clang-format. You can workaround the problem by setting a timeout for the fuzzer, but itâ€™s always better to fix superlinear behavior.

It would be interesting to fuzz other parts of LLVM, but a requirement for in-process fuzzing is that the library does not crash on invalid inputs. This holds for clang and clang-format, but not for, e.g., the LLVM bitcode reader.

Help is more than welcome! You can start by fixing one of the existing bugs in clang or clang-format (see PR23057, PR23052 and the results from AFL) or write your own fuzzer for some other part of LLVM or profile one of the existing fuzzers and try to make it faster by fixing performance bugs.

Of course, LibFuzzer can be used to test things outside of the LLVM project. As an example, and following Hanno BÃ¶ckâ€™s blog post on Heartbleed, weâ€™ve applied LibFuzzer to openssl and found Heartbleed in less than a minute. Also, quite a few new bugs have been discovered in PCRE2 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), Glibc and MUSL libc (1, 2) .

Fuzz testing, especially coverage-directed and sanitizer-aided fuzz testing, should directly compliment unit testing, integration testing, and system functional testing. We encourage everyone to start actively fuzz testing their interfaces, especially those with even a small chance of being subject to attacker-controlled inputs. We hope the LLVM fuzzing library helps you start leveraging our tools to better test your code, and let us know about any truly exciting bugs they help you find!

LLVM Weekly - #66, Apr 6th 2015

Mon, 06 Apr 2015 02:57:00 +0000

Welcome to the sixty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

color_coded, a vim plugin for syntax highlighting using libclang is now available.

Ravi, a dialect of Lua with JIT compilation via LLVM has has its first alpha release. The status of JIT compilation can be seen here.

On the mailing lists

James Knight is asking for advice on supporting 64-bit load/store on a 32-bit arch. Respondents point to examples from ARM and R600.
Eric Christopher has kicked off another discussion on LTO and codegen options.
Katya Romanova proposes adding doxygen comments for intrinsics. The intention is that the current documentation would be converted in an automated way. So far, people seem to be in favour.
Douglas Gregor is stepping down as code owner of 'all parts of Clang not covered by someone else'. Richard Smith will be taking over. Thank you Douglas for the years of hard work.
Can you cross-compile LLVM's test suite?. The answer, is yes.
Duncan P.N. Exon Smith proposes that preserve-bc-use-list-order be on by default.

LLVM commits

API migration has started for GEP constant factories. For now, nullptr can be passed for the pointee type, but you'll need to pass the type explicitly to be future-proof. r233938.
A proof of concept fuzzer based on DataFlowSanitizer has been added, as well as support for token-based fuzzing. r233613, r233745.
DebugLoc's API has been rewritten. r233573.
The SystemZ backend now supports transactional execution on the zEC12. r233803.

Clang commits

Clang gained a toolchain driver for targeting NaCl. r233594.
The size of various Stmt subclasses has been optimised on 64-bit targets. r233921.
Codegen was added for the OpenMP atomic update construct. r233513.

Other project commits

LLDB system initialization has been reworked. r233758.

LLVM Weekly - #65, Mar 30th 2015

Mon, 30 Mar 2015 02:49:00 +0000

Welcome to the sixty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The Z3 theorem prover from Microsoft Research is now on Github, and more importantly now released under the MIT license. This is a true open source license allowing commercial use, unlike the previous non-commercial use only license. It's been used with LLVM in the ALIVe project.

The schedule for EuroLLVM has been published. There are still a number of early registration tickets left. If you can be in London on 13th and 14th of April then I'd highly recommend registering.

On the mailing lists

Tom Stellard, maintainer of the R600 backend has a question about manipulating the machine scheduler to intermix ALU instruction with loads.
Sanjoy Das is seeking feedback on an optimisation issue he's seeing due to llvm.$op.with.overflow intrinsics. There is some followup discussion on how this should be dealt with.
Martin O'Riordan from Movidius asks for guidance on submitting contributing changes back upstream. Tom Stellard has a useful response.
Dylan McKay is working on an AVR backend port and is seeking advice on lowering division calls. The current version of his backend is here.
Benoit Belley writes in with an optimisation puzzle, looking for an explanation for why an icmp isn't removed. Daniel Berlin responds just a couple of hours later with an explanation of the missed optimisation as well as a path to fix it.
Gordon Kaiser is looking for anyone interested in a backend for the Fujitsu FR-series processors, now manufactured by Spansion.

LLVM commits

The GlobalMerge pass will no longer run at O1 on AArch64+ARM, and instead will only be enabled at O3. r233024.
A float2int pass was added which, as the name suggests, attempts to demote from float to int where possible. r233062.
A simple Orc-based lazy JIT has been added to lli. r233182.
LLVM gained support for PowerPC hardware transactional memory. r233204.
The ARMv8.1a architecture has been added along with some of its new instructions. r233290, r233301

Clang commits

The on-disk hash table for modules should now have a stable representation. r233156, r233249.
Intrinsics have been added for PowerPC hardware transaction memory support. r233205.
An initial version of a clang-fuzzer has been added, making use of the LLVMFuzzer library. r233455.

Other project commits

libclc gained more builtin implementations. r232977, r232965, r232964.
lld learnt how to understand the MIPS N64 relocation record format (which is described in the commit message). r233057.
lld's ARM support has improved with with the addition of indirect function handling and GOT relocations. r233383, r233277.

LLVM Weekly - #64, Mar 23rd 2015

Mon, 23 Mar 2015 08:09:00 +0000

Welcome to the sixty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Students have until Friday 27th March to get their applications in for Google Summer of Code. This gives the opportunity to get paid $5500 to work on open source over the summer, under the mentorship of someone from the community. See here for the list of mentoring organisations advertising LLVM-related projects. Please do help spread the word. I am biased, but I'd like to draw particular attention to the wide variety of lowRISC GSoC ideas, including a project to implement an LLVM pass using tagged memory to provide protection against control-flow hijacking.

GCC 5 is starting to get near to release. The first release candidate is expected in the first week of April.

On the mailing lists

Peter Collingbourne has kicked off a thread on controlling the LTO optimization level. Using LTO can cause a massive increase in compile-time. Peter argues that for some features, like the recently added control flow integrity checks in Clang, you require LTO for whole program visibility but perhaps would rather do much fewer optimisations in order to get a more reasonable compile time. He proposes a -flto-level command line option.
Eric Christopher has written to the mailing list about recent changes he's made to the TargetMachine::getSubtarget API.
Dario Domizioli proposes adding target-specific defaults for options in the LLVM tools. Response was mostly negative on the grounds that opt, llc, and friends are tools meant for LLVM developers rather than end-users. Sean Silva attempted to clarify the difference between LLVM users, LLVM end-users, and LLVM developers.

LLVM commits

The backend for the Hexagon DSP has continued to grow over the past few weeks. Most recently, support for vector instructions has been added. r232728.
The LLVM developer documentation grew guidance on writing commit messages. r232334.
LLVM learnt to support the ARMv6k target. The commit message has a handy ascii art diagram to explain its position in the ARM family. r232468.

Clang commits

The size of a Release+Asserts clang binary has been reduced by ~400k by devirtualising Attr and its subclasses. r232726.
Work on MS ABI continues, with support for HandlerMap entries for C++ catch. r232538.
A new warning, -Wpartial-ability will warn when using decls that are not available on all deployment targets. r232750.
C++14 sized deallocation has been disabled default due to compatibility issues. r232788.

Other project commits

Performance of a self-hosted lld link has again been improved. It's now down to 3 seconds on the patch author's machine (vs 5 seconds before, and 8 seconds for the GNU BFD linker). r232460.
libcxx gained the <experimental/tuple> header which implements most of the tuple functionality specified in the library fundamentals TS. r232515.
LLD now supports the semantics of simple sections mappings in linker scripts and can handle symbols defined in them. r232402, r232409.
Mips64 lldb gained an initial assembly profiler. r232619.

LLVM Weekly - #63, Mar 16th 2015

Mon, 16 Mar 2015 08:35:00 +0000

Welcome to the sixty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLVM is taking part in Google Summer of Code as a mentoring organisation. Students can earn a $5500 stipend by working on open source projects over the summer, and applications upon today (March 16th). See here for the list of mentoring organisations advertising LLVM-related projects. Please do help spread the word. I am biased, but I'd like to draw particular attention to the wide variety of lowRISC GSoC ideas, including a project to implement an LLVM pass using tagged memory to provide protection against control-flow hijacking.

Version 0.11 of Pocl, the portable open-source OpenCL implementation has been released. Additions in this release include initial Android support and MIPS architecture support.

Version 1.11 of TCE, the TTA-based (Transport Triggered Architecture) Co-design Environment, which uses LLVM has been released. This release adds support for LLVM 3.6.

There will be a LLVM microconference at the Linux Plumbers Conference in August. There is a call for speakers.

On the mailing lists

If you enjoy bikesheds, this thread may be for you. Renato Golin has kicked off a thread on commit message policy.
Rui Ueyama has posted to the mailing list summarising his recent work on LLD performance improvements. The follow up responses discuss potential remaining bottlenecks.
William Moses is interested in parallel extensions to LLVM IR, and the thread spawned quite a few interesting responses.
Mohammad Kazem's question about counting loads and stores of variables elicited some useful responses.
Will Dietz who has been maintaining the unofficial LLVM Github mirror for the past few years is interested in LLVM taking over the service 'officially'.

LLVM commits

As DataLayout is now mandatory, LLVM APIs have been updated to use references to DataLayout. r231740.
Support was added for part-word atomics on PowerPC. r231843.
Initial work on enhancing ValueTracking to infer known bits of a value from known-true conditional expressions has landed. r231879.
The PowerPC READMEs have been updated to list potential future enhancements. r231946.
The llvm.eh.actions intrinsic has been added. r232003.
The documentation for llvm-cov has been updated. r232007.
The getting started docs now describe CMake as the preferred way to build LLVM. r232135.
llvm-vtabledump is now known as llvm-cxxdump. r232301.

Clang commits

The steady stream of OpenMP patches continues, with the addition of codegen support for the omp task directive and omp for. r231762, r232036.
Copy-constructor closures for MS ABI support has been added. r231952.

Other project commits

LLD gained support for linker script expression evaluation and parsing of the MEMORY and EXTERN commands. r231707, r231928, r232110.
LLDB gained a CODE_OWNERS.txt file. r231936.

LLVM Weekly - #62, Mar 9th 2015

Mon, 09 Mar 2015 03:36:00 +0000

Welcome to the sixty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLVM is taking part in Google Summer of Code as a mentoring organisation. Students can earn a $5500 stipend by working on open source projects over the summer. See here for the list of mentoring organisations advertising LLVM-related projects. Please do help spread the word, applications open on Monday the 16th of March. I am biased, but I'd like to draw particular attention to the wide variety of lowRISC GSoC ideas, including a project to use tagged memory to provide protection against control-flow hijacking.

Ravi, a programming language based on Lua 5.3 has been announced. It features JIT compilation using LLVM, though in the current development version only a fraction of the Lua bytecodes are JIT-compiled.

On the mailing lists

Douglas Gregor has posted an RFC on adding nullability qualifiers. The mailing list post justifies the reason for adding new qualifiers despite the fact __attribute__((nonnull)) exists.
Jonas Paulsson queries the current status of spilling support in the PBQP register allocator. As Arnaud confirms, there's still work to be done to improve things.
Chris Bieneman has posted an update on the CMake build system's ability to replace autoconf.
Tom Stellard has shared his proposed release schedule for 3.5.2 and 3.6.1. This would see 3.5.2 released on the 25th of March and 3.6.1 on the 13th of May.

LLVM commits

An initial implementation of a loop interchange pass has landed. This will interchange loops to provide a more cache-friendly memory access. r231458.
A high-level support library for the new pass manager has been added. r231556.
The LLVM performance tips document has seen some new additions. r230995, r231352.
DenseMapIterators will now fail fast when compiled in debug mode. r231035.
LowerBitSets will now use byte arrays rather than bit sets to represent in-memory bit sets, which can be looked up with only a few instructions. r231043.
Another large portion of the DebugInfo changes has landed. r231082.
A new optimisation for AddressSanitizer has been added that reduces the amount of instrumentation needed, eliminating it when accessing stack variables that can be proven to be inbounds. r231241.
llvm.frameallocate has been replaced with llvm.frameescape. r231386.

Clang commits

When the -pedantic flag is given, clang will warn when a format string uses %p with non-void* args. r231211.
Work on MS ABI support continues. Throwing a C++ exception under the MS ABI is now supported. r231328.

Other project commits

The lld resolver has had a significant performance optimisation. The commit message indicates linking chrome.dll now takes 30 seconds down from 70 seconds. r231434.
The static binary size of lldb-server has been reduced due to a reduction in the number of initialised components. r230963.

LLVM Weekly - #61, Mar 2nd 2015

Mon, 02 Mar 2015 03:12:00 +0000

Welcome to the sixty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The biggest headline this week is undoubtedly the release of LLVM/Clang 3.6. See the LLVM 3.6 release notes and the Clang 3.6 release notes for a full run-down of the major changes.

The LLVMSharp C# and .NET bindings to LLVM have been released.

Pyston, the LLVM-based Python JIT developed by Dropbox has had its 0.3 release. It is now minimally self-hosting. You can also see performance results online.

Readers may enjoy this walkthrough of creating a basic compiler with LLVM.

On the mailing lists

Diego Novillo has announced Google's intention to work on PGO profiling support in LLVM. We can expect a document soon to give more detail on the plans and stimulate further discussion.
Ashutosh Nema is proposing a new loop versioning optimisation. This is where multiple versions of the loop are generated and the implementation chosen based on runtime memory aliasing tests. It was suggested that some recent work on Loop Access Analysis provides some of this functionality.
Philip Reames has suggested writing a performance guide for frontend authors. Unsurprisingly, the idea is popular.
Zachary Turner has suggested separating embedded Python from the rest of LLDB. As detailed in the post, it is difficult to provide compatibility with the standard Python binary build for Windows and precompiled Python modules.
Ahmed Bougacha started a discussion on disabling GlobalMerge, which is currently enabled for ARM and AARch64. Much of the ensuing discussion centers around understanding why there seems to be a performance degradation when using GlobalMerge with LTO.
Katya Romanova moved a discussion on a jump threading optimisation bug to llvm-dev. The issue is due to the fact an unreachable block is generated with ill-formed instruction, and there is a lot of follow on discussion about whether passes should generate unreachable blocks.
Dibyendu Majumdar wrote to the list to ask about issues eliminating redundant loads. He managed to work out the issue.

LLVM commits

Work has started on the move towards opaque pointer types. See the commit messages for more details and help on migrating existing textual IR. r230786, r230794.
The PlaceSafepoints and RewriteGCForStatepoints passes have been documented. r230420.
The GC statepoints documentation has been cleaned up and extended with example IR, assembly, and stackmaps. r230601.
The loop-invariant code motion pass has been refactored to expose its core functionality as utility functions that other transformations could use. r230178.
Implementation of support for alloca on MIPS fast-isel has started. r230300.
The PowerPC backend gained support for the QPX vector instruction set. r230413.
InductiveRangeCheckElimination can now handle loops with decreasing induction variables. r230618.
Among other improvements, llvm-pdbdump gained colorized output. r230476.
The Forward Control Flow Integrity Pass has been removed as it is being rethought and is currently unused. r230780.
The Performance Tips for Frontend Authors document was born. r230807.

Clang commits

The control flow integrity design docs has been updated to document optimisations. r230458, r230588.

Other project commits

Remote testing support was added to the libc++ and libc++abi test suites. r230592, r230643.
LLD learned to understand .gnu.linkonce input sections. r230194.

LLVM 3.6 Release

Fri, 27 Feb 2015 12:48:00 +0000

LLVM 3.6 is now available!

Get it here: http://llvm.org/releases/

This release contains the work of the LLVM community over the past six months: many many bug fixes, optimization improvements, support for more proposed C++1z features in Clang, better native Windows compatibility, embedding LLVM IR in native object files, Go bindings, and more. For details, see the release notes [LLVM, Clang].

Many thanks to everyone who helped with testing, fixing, and getting the release into a good state!

Special thanks to the volunteer release builders and testers, without whom this release would not be possible: Dimitry Andric, Sebastian DreÃŸler, Renato Golin, Sylvestre Ledru, Ben Pope, Daniel Sanders, and Nikola SmiljaniÄ‡!

If you have any questions or comments about this release, please contact the community on the mailing lists. Onwards to 3.7!

LLVM Weekly - #60, Feb 23rd 2015

Mon, 23 Feb 2015 03:05:00 +0000

Welcome to the sixtieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLVM/Clang 3.6.0-rc4 is now available for testing.

A new LLVM-based tainted flow analysis tool has been created. It's a tool designed to help detect timing attack vulnerabilities. An online demo is available.

The March bay-area LLVM social will take place on Thursday 5th March, along with the Game Developer's Conference.

The Cambridge LLVM social will take place on Wednesday 25th Feb.

HHVM, the optimised PHP virtual machine from Facebook plan to integrate an LLVM-based optimisation phase.

On the mailing lists

Lefteris Ioannidis has introduced himself on the mailing list. He is working on propagating parallelism at the IR level, with a hope to ultimately upstream his work. He's interested in chatting to anyone working in this area.
Eric Fiselier asks when libc++ can list Linux as an officially supported platform.
Hans Wennborg is asking for people to flesh out the 3.6 release notes with more details.
Hayden Livingston is working on a tool to help understand how LLVM IR changes after optimization passes are run and wonders about adding a new API to support this use case. Greg Fitzgerald points to his handy diffdump tool.
Bruce Mitchener suggests adding SWIG bindings to LLDB for JS and other languages.
Zephyr Zhao has shared his work on a GUI frontend to LLDB.

LLVM commits

The coding standards document has been updated now that MSVC 2012 support has been dropped. r229369.
The Orc API continues to evolve. The JITCompileCallbackManager has been added to create and manage JIT callbacks. r229461.
A new pass, the bit-tracking dead code elimination pass has been added. It tracks dead bits of integer-valued instructions and removes them when all bits are set. r229462.
The SystemZ backend now supports all TLS access models. r229652, r229654.
A new pass for constructing gc.statepoint sequences with explicit relocations was added. The pass will be further developed and bugfixed in-tree. r229945.
The old x86 vector shuffle lowering code has been removed (the new shuffle lowering code has been the default for ages and known regressions have been fixed). r229964.
A new bitset metadata format and lowering pass has been added. In the future, this will be used to allow a C++ program to efficiently verify that a vtable pointer is in the set of valid vtable pointers for the class or its derived classes. r230054.

Clang commits

clang-format gained support for JS type annotations and classes. r229700, r229701.
Most of the InstrProf coverage mapping generation code has been rewritten. r229748.
Clang learnt how to analyze FreeBSD kernel printf extensions. r229921.
Support has been added to Clang for a form of Control Flow Integrity for virtual function calls. It verifies the vptr of the correct dynamic type is being used. r230055.

Other project commits

ThreadSanitizer gained support for MIPS64. r229972.
lldb now supports process language on Android from lldb-gdbserver. r229371.
OpenMP gained a new user-guided lock API. r230030.

LLVM Weekly - #59, Feb 16th 2015

Mon, 16 Feb 2015 03:15:00 +0000

Welcome to the fifty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Reminder, the EuroLLVM 2015 call for papers submission deadline is TODAY. See here for details.

LLVM and Clang 3.6-rc3 has been tagged, any help with testing is greatly appreciated.

On the mailing lists

The discussion about moving towards a singular pointer type has continued. It seems everyone is in favour.
Philip Reames has proposed an attribute that indicates the value is either null or dereferenceable. The intended use case is for Java, though it may be useful for various other higher level languages.
Duncan P.N. Exon Smith is concerned that llc is becoming less useful as a debugging tool and proposes modifying Clang to store target defaults on a module.
Alexey Samsonov proposes dropping support for building the sanitizers with autotools. The autotools build system never reached feature parity with CMake for the sanitizers and is undertested, there seem to be no objections.

LLVM commits

The biggest chunk of internal refactoring of debug metadata has landed, with the addition of specialized debug info metadata nodes. r228640.
New intrinsics llvm.eh.begincatch and llvm.eh.endcatch intrinsics have been added to support Windows exception handling. r228733.
A DebugInfoPDB implementation using the MS Debug Interface Access SDK has landed. r228747.
SimplifyCFG will now use TargetTransformInfo for cost analysis. r228826.
A profitability heuristic has been added for the x86 mov-to-push optimisation. r228915.
PassManager.h is now LegacyPassManager.h. As described in the commit message, if you are an out of tree LLVM user you may need to update your includes. r229094.

Clang commits

The /volatile:ms semantics have been implemented, turning volatile loads and stores into atomic acquire and release operations. r229082.

Other project commits

C++14's sized deallocation functions have been implemented in libcxx. r229281.
lld learnt to handle the --wrap option. r228906.
lldb gained the concept of "runtime support values". r228791.
The remote-android platform has been added to lldb. r228943.

LLVM Weekly - #58, Feb 9th 2015

Mon, 09 Feb 2015 08:33:00 +0000

Welcome to the fifty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The Red Hat developer blog has a post about the plan to change the G++ ABI along with GCC 5. This is required for full C++11 compatibility. Unlike the last ABI change where the libstdc++ soname was changed, it will stay the same and instead different mangled names will be used for symbols.

Quarks lab have a tutorial on how to add a simple obfuscation pass to LLVM.

On the mailing lists

It is currently planned to raise the LLVM minimum required MSVC to 2013. If you are using MSVC 2012 to build LLVM and this would cause significant hardship to you for some reason, now is the time to speak up.
Chandler Carruth has shared a handy cheatsheet for those maintaining an out-of-tree target on how to adjust to the recent TargetTransformInfo changes.
The idea of dropping pointer types in LLVM IR has been brought up a few times recently. David Blaikie is interested in volunteering to do the necessary work. Much of the ensuing discussion is about how this might be done in an incremental way, without causing too many problems for people with out of tree passes. Chandler Carruth proposes a rough work-plan.
James Molloy has posted an RFC on inlining of recursive functions.
Chris Bieneman has summarised progress on brining the CMake buildsystem to the point it can replace autoconf in LLVM/Clang.
Karthik Bat has posted an RFC on adding a LoopInterchange pass to LLVM. This is a pass targeted at improving performance through cache locality.

LLVM commits

A straight-line strength reduction pass has been introduced. This is intended to simplify statements that are generated after loop unrolling. It is enabled only for NVPTX for the time being. r228016.
A MachineInstruction pass that converts stack-relative moves of function arguments to use the X86 push instruction. This is only enabled when optimising for code size. r227752.
The BasicAA will now try to disambiguate GetElementPtr through arrays of structs into different fields. r228498.
Work on improving support in LLVM for GC continues, with the addition of a pass for inserting safepoints into arbitrary IR. r228090.
(Very) minimal support for the newly announced ARM Cortex-A72 landed. For now, the A72 is modeled as an A57. r228140.
A new heuristic has been added for complete loop unrolling, which looks at what loads might be made constant if the loop is completely unrolled. r228265.
A pass to exploit PowerPC's pre-increment load/store support has been added. r228328.
A platform-independent interface to a PDB reader has landed. r228428.
LLVM learnt to recognise masked gather and scatter intrinsics. r228521.

Clang commits

Clang learnt the 'novtable' attribute (for MS ABI support). r227796, r227838.
New functionality has been added for thread safety analysis, before/after annotations can now be used on mutexes. r227997.

Other project commits

A whole bunch of work on LLDB with multithreaded applications on Linux has landed. r227909, r227912, r227913, and more.
The default Polly build is now completely free of GPL dependencies. The isl and imath dependencies have been imported into the codebase to make it easier to build with a known-good revision. r228193.

LLVM Weekly - #57, Feb 2nd 2015

Mon, 02 Feb 2015 05:52:00 +0000

Welcome to the fifty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I've been at FOSDEM this weekend in Brussels (which is why this week's issue is perhaps a little shorter than usual!). Most talks were recorded and I'll be linking to the videos from the LLVM devroom once they're up. For those interested, you can see the slides from my lowRISC talk here. If you want to chat about the project, you may want to join #lowRISC on irc.oftc.net.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Eli Bendersky has written a useful introduction to using the llvmlite Python to LLVM binding, which was borne out of the Numba project.

LLVM/Clang 3.6-rc2 has been tagged and is ready for testing.

The next LLVM bay-area social is taking place on Feb 5th at 7pm.

The EuroLLVM call for papers closes on Feb 16th.

On the mailing lists

David Majneer has attempted to describe the often confusing 'poison' semantics for LLVM, and submitted an RFC. Masses of discussion follows.
Dylan McKay has been working on an LLVM backend for AVR and has come to the mailing list with two questions on instruction encoding in the last week. They're interesting questions with useful answers - how to modify the encoding based on target features and how to encode instructions with inconsistent formats.
Saleem Abdulrasool kicked off a long discussion on where libunwind should live.
Andrew Kaylor has posted an RFC on adding support for native windows C++ exception handling.
Matt Arsenault has posted an RFC on adding and ISD node for fused multiply add operations.
A question about the meaning of RAUW is a good opportunity to highlight the existence of the very handy LLVM lexicon.

LLVM commits

A simple in-process fuzzer was added to LLVM. r227252.
The programmer's manual gained a section about type hierarchies, polymorphism, and virtual dispatch. r227292.
The upstreaming of Sony's patches for their PS4 compiler started with the addition of the PS4 target triple. r227060.
DataLayout now lives again in the TargetMachine rather than the TargetSubtagertInfo. r227113.
RuntimeDyld learned to support weak symbols. r227228.
LLVM gained a new tool, llvm-pdbdump to dump the contents of Microsoft PDB ('Program DataBase') files, including debug tables. r227241, r227257.
The loop vectorizer now supports an arbitrary constant step for its induction variables, rather than just -1 or +1. r227557.

Clang commits

The clang-format-fuzzer tool was added, which builds on the LLVM fuzzer lib. r227354.
MS ABI work continues with proper support for setjmp. r227426.
Clang started to learn about the PS4 target triple. r227194.

Other project commits

The PowerPC ELF target was dropped from lld. r227320.

LLVM Weekly - #56, Jan 26th 2015

Mon, 26 Jan 2015 06:56:00 +0000

Welcome to the fifty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or or @llvmweekly or @asbradbury on Twitter.

I'll be talking the lowRISC project to produce a fully open-source SoC at FOSDEM this coming weekend. Do come and see my main track talk and read my speaker interview for more background. There is of course an LLVM toolchain devroomon the Sunday.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Stephen Diehl has written an absolutely fantastic tutorial on writing an LLVM specializer for Python, guiding you through the process of creating something like Numba.

A new tool, Dwgrep (DWARF Grep) may be of interest to many LLVM Weekly readers. This blog post gives an intro to using it.

Paul Smith has a blog post on getting started with the LLVM C API.

A post on the official LLVM Blog announces that LLDB is coming to Windows, announcing to a wider audience that it is now possible to debug simple programs with LLDB on Windows and giving a rationale for investing effort into porting LLDB to Windows and adding support for the MS debug format. The post also features a todo list indicating what's next for Windows support.

A draft version 0.1 of the IA-32 psABI (processor specific application binary interface) is available. This aims to supplement the existing System V ABI with conventions relevant to newer features such as SSE1-4 and AVX. Comments are welcome.

LLVM/Clang 3.6-rc1 is now available. Get testing and filing bugs.

ELLCC 0.1.8 has been released. ELLCC is an LLVM/Clang-based cross compilation toolchain.

LLDB now has it's own IRC channel. You'll want to join #lldb on irc.oftc.net.

On the mailing lists

Chandler Carruth has posted a canonicalization-related RFC. He demonstrates a case where a trivial function is compiled to two equivalent IR sequences. This was later committed.
Michael Zolotukhin has proposed an RFC on adding a heuristic for complete loop unrolling. Currently, the loop unrolling heuristics don't take account of any new optimisations that maybe enabled by unrolling the loop. Changing that would allow for the profitability of the unroll to be more accurately approximated.
Chandler Carruth is getting ready to turn on by default the next part of his vector shuffle work. Now is a good time to benchmark and report any regressions you see with -x86-experimental-vector-shuffle-legality on your codebases.
Ahmed Bougacha has been having problems with the cost model calculations for saturation instructions. The cost is over-estimated because a number of the individual IR instructions fold-away later in lowering. He suggests adding a new method to TargetTransformInfo for multi-instruction cost computation. There hasn't been much feedback thus far.
Chandler Carruth has been looking through the LLD libraries and trying to work out the current layering, as well as what a potential future layering might be. He proposes offering a basic library offering basic functionality and a second library offering a higher-level interface for actually doing linking.

LLVM commits

A backend targeting the extended BPF (Berkeley Packet Filter) interpreter/JIT in the Linux kernel has been added. See this LWN article for more background. r227008.
The initial version of the new ORC JIT API has landed. r226940.
There's been a flurry of work on the new pass manager this week. One commit I will choose to pick out is the port of InstCombine to the new pass manager, which seems like a milestone or sorts. r226987.
LLVM learnt how to use the GHC calling convention on AArch64. r226473.
InstCombine will now canonicalize loads which are only ever stored to always use a legal integer type if one is available. r226781.
The llvm_any_ty type for intrinsics has been born. r226857.
llvm-objdump now understands -indirect-symbols to dump the Mach-O indirect symbol table. r226848.

Clang commits

Clang now supports SPIR calling conventions. r226548.
It's now possible to set the stack probe size on the command line. r226601.
Clang gained initial support for Win64 SEH IR emission. r226760.

Other project commits

Sun Solaris users, now is the time to celebrate. libc++ will now build on your platform of choice. r226947.
A minimal implementation of ARM static linking landed in lld. r226643.
Basic support for PPC was added to openmp. r226479.

LLDB is Coming to Windows

Tue, 20 Jan 2015 10:48:00 +0000

We've spoken in the past about teaching Clang to fully support Windows and be compatible with MSVC. Until now, a big missing piece in this story has been debugging the clang-generated executables. Over the past 6 months, we've started working on making LLDB work well on Windows and support debugging both regular Windows programs and those produced by Clang.

Why not use an existing debugger such as GDB, Visual Studio's, or WinDBG? There are a lot of factors in making this kind of decision. For example, while GDB understands the DWARF debug information produced by Clang on Windows, it doesn't understand the Microsoft C++ ABI or debug information format. On the other hand, neither Visual Studio nor WinDBG understand the DWARF debug information produced by Clang. With LLDB, we can teach it to support both of these formats, making it usable with a wider range of programs. There are also other reasons why we're really excited to work on LLDB for Windows, such as the tight integration with Clang which lets it support all of the same C++ features in its expression parser that Clang supports in your source code. We're also looking to continue adding new functionality to the debugging experience going forward, and having an open source debugger that is part of the larger LLVM project makes this really easy.

The past few months have been spent porting LLDB's core codebase to Windows. We've been fixing POSIX assumptions, enhancing the OS abstraction layer, and removing platform specific system calls from generic code. Sometimes we have needed to take on significant refactorings to build abstractions where they are necessary to support platform specific differences. We have also worked to port the test infrastructure to Windows and set up build bots to ensure things stay green.

This preliminary bootstraping work is mostly complete, and you can use LLDB to debug simple executables generated with Clang on Windows today. Note the use of the word "simple". At last check, approximately 50% of LLDB's tests fail on Windows. Our baseline, however, which is a single 32-bit executable (i.e. no shared libraries), single-threaded application built and linked with Clang and LLD using DWARF debug information, works today. We've tested all of the fundamental functionality such as:

Various methods of setting breakpoints (address, source file+line, symbol name, etc)
Stopping at and continuing from breakpoints
Process inspection while stopped, such as stack unwinding, frame setting, memory examination, local variables, expression evaluation, stepping, etc (one notable exception to this is that step-over doesn't yet work well in the presence of limited symbol information).

Of course, there is still more to be done. Here are some of the areas we're planning to work on next:

Fixing low hanging fruit by improving the pass-rate of the test suite.
Better support for debugging multi-threaded applications.
Support for debugging crash dumps.
Support for debugging x64 binaries.
Enabling stepping through shared libraries.
Understanding PDB (for debugging system libraries, and executables generated with MSVC). Although the exact format of PDB is undocumented, Microsoft still provides a rich API for querying PDB in the form of the DIA SDK.
Adding debugging commands familiar to users of WinDBG (e.g. !handle, !peb, etc)
Remote debugging
Symbol server support
Visual Studio integration

If you're using Clang on Windows, we would encourage you to build LLDB (it should be in the Windows LLVM installer soon) and let us know your thoughts by posting them to lldb-dev. Make sure you file bugs against LLDB if you notice anything wrong, and we would love for you to dive into the code and help out. If you see something wrong, dig in and try to fix it, and post your patch to lldb-commits.

LLVM Weekly - #55, Jan 19th 2015

Mon, 19 Jan 2015 09:58:00 +0000

Welcome to the fifty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

It seems to have been a very busy week in the world of LLVM, particularly with regards to discussion on the mailing list. Due to travel etc and the volume of traffic, I haven't been able to do much summarisation of mailing list discussion I'm afraid.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLM/Clang 3.6 has been branched and subsequently, 3.6 RC1 has been tagged.

LLVM/Clang 3.5.1 seems to have been quietly released.

Registration for EuroLLVM 2015, to be held at Goldsmiths College in London, UK on April 13-14th is now open.

All slides and videos from the last LLVM Developers' meeting are now live, including those from Apple employees.

On the mailing lists

Ahmed Bougacha has posted an RFC on adding integer saturation intrinsics to LLVM. There are various questions in the ensuing thread about whether adding an intrisc is necessary and the best way to go about this. i.e. whether it is possible to just pattern match later on in the compilation flow.
In response to a question about using LLDB when attached to a system that may switch between AARch32 and AArch64, Colin Riley has written a good description of current support and potential future support for multiple targets in one debugger session.
Jonathan Ragan-Kelley asks about emitting IR in older formats due to a requirement to emit 3.2-compatible bitcode for Nvidia's libNVVM. Several replies suggest looking at SPIR.
Do you wonder what the difference is between the multiple ways of querying operation costs? Wonder no more.
Duncan P.N. Exon Smith has posted on RFC on first-class debug info in IR. There have been a few changes since the previous proposal.
Chandler Carruth has written a summary of alias analysis pass ordering in LLVM and Clang. This details both the current situation as well as Chandler's views on how it should change in the future.
Philip Reames is seeking wider feedback on two implementation issues for GCStrategy. The two key questions are whether GC-specific properties should be checked in the IR verifier and what the access model for GCStrategy should be. No responses yet, so now is the time to dive in.
Ramshankar Ramanarayanan has posted a proof of concept for a loop fusion pass. Meanwhile, Adam Nemet has an RFC for a loop distribution pass.
Lang Hanes has posted a proposed new JIT API with a catchy name (ORC: On Request Compilation). The aim is to cleanly support a wider range of JIT uses cases, and to be clear this higher level API would not replace the existing MCJIT.
Bjoern Haase has been spending some time examining generated code for armv6 microcontroller targets such as the Cortex M0. He has a series of suggestions for tweaking default optimizer settings for this target.

LLVM commits

A new code diversity feature is now available. The NoopInsertion pass will add random no-ops to x86 binaries to try to make ROP attacks more difficult by increasing diversity. r225908. I highly recommend reading up on the blind ROP attack published last year. It would also be interesting to see an implementation of G-Free for producing binaries without simple gadgets. The commit was later reverted for some reason.
A nice summary of recent MIPS and PowerPC target developments, as well as the OCaml bindings is now there in the form of the 3.6 release notes. r225607, r225695, r225779.
LLVM learned the llvm.frameallocate and llvm.framerecover intrinsics, which allow multiple functions to share a single stack allocation from one function's call frame. r225746, r225752.
An experimental (disabled by default) 'inductive range check elimination' pass has landed. This attempts to eliminates range checks of the form 0 <= A*I + B < Length. r226201.
StackMap/PatchPoint support is now available for the PowerPC target. r225808.
Initial support for Win64 SEH catch handlers has landed. See the commit message for current missing functionality. r225904.
A new utility script has been started to help update simple regression tests. It needs some work to generalise it beyond x86. r225618.
TargetLibraryInfo has been moved into the Analysis library. r226078.

Clang commits

The new -fno-inline-asm flag has been added to disallow all inline asm. If it exists in the input code it will be reported as an error.
r226340.
-fsanitize-recover command line flags are again supported. r225719.
The integrated assembler is now used by default on 32-bit PowerPC and SPARC. r225958.

Other project commits

The libcxx build system learnt how to cross-compile. r226237.
LLD gained a nice speedup by speculative instantiating archive file members. This shaves off a second or two for linking lld with lld. r226336.
LLD learnt the --as-needed flag (previously this was the default behaviour). r226274.
OpenMP gained an AARch64 port. r225792.

LLVM Weekly - #54, Jan 12th 2015

Mon, 12 Jan 2015 18:10:00 +0000

Welcome to the fifty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

As you receive this week's issue, I should be on my way to California where I'll be presenting lowRISC at the RISC-V workshop in Monterey and having a few mother meetings. I'm in SF Fri-Sun and somewhat free on the Saturday if anyone wants to meet and chat LLVM or lowRISC/RISC-V.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Euro LLVM 2015 will be held on April 13th-14th in London, UK. The call for papers is now open with a deadline of 16th Feb.

Talks for the LLVM devroom at FOSDEMhave been announced. The LLVM devroom is on Sunday 1st Feb. Readers will be pleased to know this doesn't clash with my talk on lowRISC which is on the Saturday.

Google now use Clang for production Chrome builds on Linux. They were previously using GCC 4.6. Compared to that baseline, performance stayed roughly the same while binary size decreased by 8%. It would certainly have been interesting to compare to a more recent GCC baseline. The blog post indicates they're hopeful to use Clang in the future for building Chrome for Windows.

Philip Reames did an interesting back of the envelope calculation about the cost of maintaining LLVM. He picked out commits which seems like they could be trivially automated and guesstimated a cost based on developer time. The figure he arrives at is $1400 per month.

The next LLVM social for Cambridge, UK will be on Wed 21st Jan at 7:30pm.

On the mailing lists

LLVM 3.6 will be branching soon, on 14th January.
Philip Reames asks whether address space 1 is reserved on any architecture. It seems the answer is no, though the thread resulted in some discussion on the use of address spaces and the ability to reserve some. Philip had a strawman proposal for meanings of different address space numbers.
Chandler Carruth has suggested new IR features are needed to represent the cases global metadata is currently used for. Metadata was intended to be used to hold information that can be safely dropped, though this isn't true for e.g. module flags.
Arch Robinson kicked off a discussion about floating point range checks in LLVM. This isn't currently supported, though there's agreement it could be useful as well as a fair amount of discussion on some of the expected subtleties.
If you're wondering about alias instructions, Bruce Hoult has a good explanation.
Note to out-of-tree backend maintainers, get/setLoadExtAction now takes another parameter.
Right now, LLDB will compile entered expressions in C++ mode. As noted on the lldb mailing list this can be problematic when e.g. debugging a C function which has a local variable called 'this'. Greg Clayton points out how helpful supporting C++ expressions can be, even when debugging C code.
A thread about the design of the new pass manager has been revived. Both Chandler Carruth and Philip Reames suggest BasicBlockPasses should die.
Philip Reames is seeking feedback on a transformation which would convert a loop to a loop nest if it contains infrequently executed slow paths. There's some interesting discussion in the thread, and it's also worth reading Duncan P.N. Exon Smith's clarification of branch weight, branch probability, block frequency, and block bias.

LLVM commits

An option hoist-cheap-insts has been added to the machine loop invariant code motion pass to enable hosting even cheap instructions (as long as register pressure is low). This is disabled by default. r225470.
The calculation of the unrolled loop size has been fixed. Targets may want to re-tune their default threshold. r225565, r225566.
DIE.h (datastructures for DWARF info entries) is now a public CodeGen header rather than being private to the AsmPrinter implementation. dsymutil will make use of it. r225208.
The new pass manager now has a handy utility for generating a no-op pass that forces a usually lazy analysis to be run. r225236.
There's been a minor change to the .ll syntax for comdats. r225302.
There have been some minor improvements to the emacs packages for LLVM and tablegen mode. r225356.
An example GCStrategy using the new statepoint infrastructure has been added. r225365, r225366.

Clang commits

A Wself-move warning has been introduced. Similar to -Wself-assign, it will warn you when your code tries to move a value to itself. r225581.
The I, J, K, M, N, O inline assembly constraints are now checked. r225244.

Other project commits

The libcxx test infrastructure has been refactored into separate modules. r225532.
The effort to retire InputElement in lld continues. Linker script files are no longer represented as an InputElement. r225330.
Polly has gained a changelog in preparation of the next release.r225264.
Polly has also gained a TODO list for its next phase of development. r225388.

Using clang for Chrome production builds on Linux

Mon, 05 Jan 2015 10:39:00 +0000

Chrome 38 was released early October 2014. It is the first release where the Linux binaries shipped to users are built by clang. Previously, this was done by gcc 4.6. As you can read in the announcement email, the switch happened without many issues. Performance stayed roughly the same, binary size decreased by about 8%. In this post I'd like to discuss the motivation for this switch.

Motivation

There are two reasons for the switch.

1. Many Chromium developers already used clang on Linux. We've supported opting in to clang for since before clang supported C++ â€“ because of this, we have a process in place for shipping new clang binaries to all developers and bots every few weeks. Because of clang's good diagnostics (some of which we added due to bugs in Chromium we thought the compiler should catch), speed, and because of our Chromium-specific clang plugin, many Chromium developers switched to clang over the years. Making clang the default compiler removes a stumbling block for people new to the project.

2. We want to use modern C++ features in Chromium. This requires a recent toolchain â€“ we figured we needed at least gcc 4.8. For Chrome for Android and Chrome for Chrome OS, we updated our gcc compilers to 4.8 (and then 4.9) â€“ easy since these ports use a non-system gcc already. Chrome for Mac has been using Chromium's clang since Chrome 15 and was already in a good state. Chrome for iOS uses Xcode 5's clang, which is also new enough. Chrome for Windows uses Visual Studio 2013 Update 4. On Linux, switching to clang was the easiest way forward.

Keeping up with C++'s evolution in a large, multi-platform project

C++ had been static for many years. C++11 is the first real update to the C++ language since the original C++ standard (approved on July 27 1998). C++98 predated the founding of Google, YouTube, Facebook, Twitter, the releases of Mac OS X and Windows XP, and x86 SSE instructions. The time between the two standards saw the rise and fall of the iPod, several waves of social networks, and the smartphone explosion.

The time between C++11 and C++14 was three years, and the next major iteration of the language is speculated to be finished in 2017, three years from C++14. This is a dramatic change, and it has repercussions on how to build and ship C++ programs. It took us 3+ years to get to a state where we can use C++11 in Chromium; C++14 will hopefully take us less long. (If you're targeting fewer platforms, you'll have an easier time.)

There are two parts to C++11: New language features, and new library features. The language features just require a modern compiler at build time on build machines, the library features need a new standard library at runtime on the user's machine.

Deploying a new compiler is conceptually relatively simple. If your developers are on Ubuntu LTS releases and you make them use the newest LTS release, they get new compilers every two years â€“ so just using the default system compiler means you're up to two years behind. There needs to be some process to relatively transparently deploy new toolchains to your developers â€“ an "evergreen compiler". We now have this in place for Chromium â€“ on Linux, by using clang. (We still accept patches to keep Chromium buildable with gccs >= 4.8 for people who prefer compiling locally over using precompiled binaries, and we still use gcc as the target compiler for Chrome for Android and Chrome OS.)

The library situation is slightly more tricky: On Linux and Mac OS X, programs are usually linked against the system C++ library. Chrome wants to support Mac OS X 10.6 a bit longer (our users seem to love this OS X release), and the only C++ library this ships with is libstdc++ 4.2 â€“ which doesn't have any C++11 bits. Similarly, Ubuntu Precise only has libstdc++ 4.6. It seems that with C++ updating more often, products will have to either stop supporting older OS versions (even if they still have many users on these old versions), adopt new C++ features very slowly, or ship with a bundled C++ standard library. The latter implies that system libraries shouldn't have a C++ interface for ABI reasons â€“ luckily, this is mostly already the case.

To make things slightly more complicated, gcc and libstdc++ expect to be updated at the same time. gcc 4.8 links to libstdc++ 4.8, so upgrading gcc 4.8 while still linking to Precise's libstdc++ 4.6 isn't easy. clang explicitly supports building with older libstdc++ versions.

For Chromium, we opted to enable C++11 language features now, and then allow C++11 library features later once we have figured out the story there. This allows us to incrementally adopt C++11 features in Chromium, but it's not without risks: vector<int> v0{42} for example means something different with an old C++ library and a new C++ library that has a vector constructor taking an initializer_list. We disallow using uniform initialization for now because of this.

Since bundling a C++ library seems to become more common with this new C++ update cadence, it would be nice if compiler drivers helped with this. Just statically linking libstdc++ / libc++ isn't enough if you're shipping a product consisting of several executables or shared libraries â€“ they need to dynamically link to a shared C++ library with the right rpaths, the C++ library probably needs mangled symbol names that don't conflict with the system C++ library which might be loaded into the same process due to other system libraries using it internally (for example, maybe using an inline namespace with an application-specific name), etc.

Future directions

As mentioned above, we're trying to figure out the C++ library situation. The tricky cases are Chrome for Android (which currently uses STLport) and Chrome for Mac. We're hoping to switch Chrome for Android to libc++ (while still using gcc as compiler). On Mac, we'll likely bundle libc++ with Chrome too.

We're working on making clang usable for compiling Chrome for Windows. The main motivations for this are using AddressSanitizer, providing a compiler with great diagnostics for developers, and getting our tooling infrastructure working on Windows (used for example automated large-scale cross-OS refactoring and for building our code search index â€“ try clicking a few class names; at the moment only code built on Linux is hyperlinked). We won't use clang as a production compiler on Windows unless it produces a chrome binary that's competitive with Visual Studio's on both binary size and performance. (From an open-source perspective, it is nice being able to use an open-source compiler to compile an open-source program.)

You can reach us at [email protected]

LLVM Weekly - #53, Jan 5th 2015

Mon, 05 Jan 2015 06:30:00 +0000

Welcome to the fifty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'm going to be in California next week for the RISC-V workshop. I'm arriving at SFO on Monday 12th and leaving on Sunday the 18th. Do let me know if you want to meet and talk lowRISC/RISC-V or LLVM, and we'll see what we can do.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

I was getting ready to break out gitstats for some analysis of the LLVM repo and I find to my delight that Phoronix has saved me the trouble and has shared some stats on activity in the LLVM repo over the past year.

Tom Stellard has made a blog post announcing some recent RadeonSI performance improvements on his LLVM development branch. This includes 60% improvement in one OpenCL benchmark and 10-25% in a range of other OpenCL tests.

GaÃ«tan Lehmann has written a blog post about getting started with libclang using the Python bindings.

The C++ Filesystem Technical Specification, based on the Boost.Filesystem library has been approved.

On the mailing lists

Virgile Bello has some questions on how he can control the calling convention in LLVM. In this case, he has an CLR frontend and is trying to pass an object on the CLR stack to a native Win32 function. Reid Kleckner suggests the best way may be to just link with Clang and use its implementation. In another followup, he links to the talk on this topic at the last LLVM dev meeting.
Is anybody using the ModuleBuilder class in Clang?. If so, now is the time to speak up as it's slated to be removed.
Sami Liedes has set up a new bot to test Clang with fuzzed inputs. A report from the bot is available here and the code is here.
The release of LLVM/Clang 3.5.1 may be slightly delayed due to the addition of new patches late in the process. Chandler Carruth points out that there are some unpleasant bugs in InstCombine in the current 3.5.1 release candidate. If there is a release candidate 3, the patch in question will definitely make it in.

LLVM commits

Instruction selection for bit-permuting operations on PowerPC has been improved. r225056.
The scalar replacement of aggregates (SROA) pass has started to learn how to more intelligently handle split loads and stores. As explained in detail in the commit message, the old approach lead to complex IR that can be difficult for the optimizer to work with. SROA is now also more aggressive in its splitting of loads. r225061, r225074.
InstCombine will now try to transform A-B < 0 in to A < B. r225034.
The Hexagon (a Qualcomm DSP) backend has seen quite a lot of work recently. Interested parties are best of flicking through the commit log of lib/Target/Hexagon. r225005, r225006, etc.

Clang commits

More crash bugs have been uncovered and fixed by the naive fuzzing technique previously covered in LLVM Weekly. e.g. r224915.

Other project commits

The lldb website has been updated with more information about LLDB on windows, including build instructions. r225023.

LLVM Weekly - #52, Dec 29th 2014

Mon, 29 Dec 2014 02:20:00 +0000

Welcome to the fifty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

This issue marks the end of one full year of LLVM Weekly. It's a little shorter than usual as the frenetic pace of LLVM/Clang development has slowed over the holiday period. Surprising even to me is that we managed to make it full all 52 weeks with an issue every Monday as promised. This requires a non-trivial amount of time each week (2-3+ hours), but I am intending to keep it going into 2015. I'd like to give a big thank you to everyone who's said hi at a conference, sent in corrections or tips on content, or just sent a random thank you. It's been very helpful in motivation. I don't currently intend to change anything about the structure or content of each issue for next year, but if you have any ideas then please let me know.

I can't make it to 31C3 due to the awkward timing of the event, but do let me know if there are any LLVM/Clang related talks worth sharing. There was a talk about Code Pointer Integrity which has previously been covered in LLVM Weekly and is working towards upstreaming. The video is here. If you're interested in lowRISC and at 31C3, Bunnie is leading a discussion about itat 2pm on Monday (today).

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

There doesn't seem to have been any LLVM or Clang related news over the past week. Everyone seems to be busy with non-LLVM related activities over the christmas break. If you're looking for a job though, Codeplay tell me they have two vancancies: one for a debugger engineer and another for a compiler engineer.

On the mailing lists

David Li has shared some early info on Google's plans for LTO. He describes the concept of 'peak optimization performance' and some of the objectives of the new design. This includes the ability to handle programs 10x or 100x the size of Firefox. We can expect more information in 2015, maybe as early as January.
The discussion on possible approaches to reducing the size of libLLVM has continued. Chris Bieneman has shared some more size stats. These gains come from removing unused intrinsics. Chandler Carruth has followed up with a pleasingly thought-provoking argument on a different approach: target-specific intrinsics shouldn't exist in the LLVM front or middle-end. He describes the obvious issues with this, with the most fiddly probably being instruction selection converting appropriate IR to the right target-specific functionality.

LLVM commits

The SROA (scalar replacement of aggregates) pass has seen some refactoring to, in the future, allow for more intelligent rewriting. r224742, r224798.
The masked load and store intrinsics have been documented. r224832.
CodeGenPrepare learned to speculate calls to llvm.cttz/ctlz (count trailing/leading zeroes) if isCheapToSpeculateCtlz/isCheapToSpeculatCttz in TargetLowering return true. r224899.

Clang commits

The Clang internals manual has been extended with stub sections on Parse, Sema, and CodeGen. r224894.

Other project commits

The libcxx LIT test-suite has seen a number of new configuration options. Even better, these are now documented. r224728.

LLVM Weekly - #51, Dec 22nd 2014

Mon, 22 Dec 2014 02:57:00 +0000

Welcome to the fifty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Last week as part of the lowRISC project I was involved in sharing our plans for tagged memory and 'minion' cores in the initial version. We've almost made it a full year of LLVM Weekly with no interruption of service!

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

3.5.1-rc2 has been tagged, time to get testing again.

Version 0.15.1 of LDC, the LLVM D Compiler [email protected]">has been released. The most prominent feature is probably the addition of preliminary support for MSVC on Win64.

SN Systems (part of Sony) have written a blog post describing their recently contributed ABI test suite.

Peter Wilmott has benchmarked Ruby across various GCC and Clang releases. The discussion at HN may be of interest.

On the mailing lists

Elena Demikhovsky has posted a proposal for indexed load and store intrinsics. These are intended for AVX-512 or AVX2 gather/scatter instructions which allow read/write access to multiple memory addresses.
Chad Rosier kicked off a discussion on lowering switch statements in the presence of data from profile guided optimisation. There's some quite detailed discussion about when to use a Huffman tree vs a jump table.
Andrew Kaylor has posted his impression of what needs to be done for MSVC exception handling support. Reid Kleckner's response is informative.
Ulrich Weigand is taking over ownership of the SystemZ port.
LLVM/Clang 3.6 is expected to branch in January. Marshall Clow has shared a summary of the timings of releases over the past few years.
Sean Silva has shared some thorough notes on the use of standard deviation and benchmarking in general.

LLVM commits

Metadata is now typeless in assembly. r224257.
PowerPC instruction selection for bit-permuting operations has been improved. r224318.
An optimisation has been added to move sign/zero extends close to loads which causes performance improvements of 2-3% on a few benchmarks on x86. r224351.
More overflow arithmetic intrinsics are strength reduced into regular arithmetic operations if possible. r224417.

Clang commits

Codegen for 'omp for' has started to be committed. r224233.
-save-temps will now emit unoptimized bitcode files. r224688.

Other project commits

The libcxx test suite can be run with ccache now. r224603.
Breakpoints can now be tagged with a name in lldb. r224392.

LLVM Weekly - #50, Dec 15th 2014

Mon, 15 Dec 2014 03:29:00 +0000

Welcome to the fiftieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'll be at MICRO-47 this week. If you're there do say hi, especially if you want to chat about LLVM or lowRISC/RISC-V.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The videos and slides from the 2014 LLVM dev meeting went online last week. I already linked to them then, but there's enough interesting stuff there I think I can justify linking again.

LLVM/Clang 3.5.1-rc1 has been tagged. Volunteer testers are very welcome.

Clang UPC 3.4.1 has been released. This is a Unified Parallel C compiler that can target SMP systems or Portals4.

On the mailing lists

Discussion has continued on future plans for GC in LLVM with input from Russel Hadley at Microsoft and Ben Karel, who seems to be the most extensive user of the existing GC infrastructure with his Foster language.
Chris Bieneman started a discussion about supporting stripping out unused instrinsics with the aim of reducing the size of libLLVM. The proposed patches reduce binary size by ~500k, which he later points out is more significant in the context of their already size-reduced build.
Marshall Clow has shared a proposal on how to manage ABI changes in libc++. The proposal involves introducing macros to enable ABI-breaking changes.

LLVM commits

The LLVM Kaleidoscope tutorial has been extended with an 8th chapter, describing how to add debug information using DWARF And DIBuilder. r223671. A rendered version can be found here.
Extensive documentation has been added for the MergeFunctions pass. r223931.
A monster commit to split Metadata from the Value class hierarchy has landed. r223802.
InstrProf has been born. This involves the llvm.instrprof_increment instrinsic and the -instrprof pass. This moves logic from Clang's CodeGenPGO into LLVM. r223672.
With the addition of support for SELECT nodes, the MIPS backend now supports codegen of MIPS-II targets on the LLVM test-suite. Code generation has also been enabled for MIPS-III. r224124, r224128.
Work has started on an LLVM-based dsymutil tool, with the aim to replace Darwin's dsymutil (a DWARF linker). r223793.
LiveInterval has gained support to track the liveness of subregisters. r223877.
Work has started on converting moves to pushes on X86 when appropriate. r223757.
Print and verify passes are now added after each MachineFunctionPass by default, rather than on some arbitrarily chosen subset. r224042.
LLVM now requires Python 2.7. Previously 2.5 was required. r224129.

Clang commits

The __builtin_call_with_static_chain GNU extension has been implemented. r224167.
Clang's CodeGenPGO has moved to using the new LLVM -instrprof pass. r223683.
Clang now accepts Intel microarchitecture names as the -march argument. r223776.

Other project commits

libcxx gained relational operators in std::experimental::optional. r223775.
libcxx can now be built as a 32-bit library. r224096.
The lldb unwinder has learned to use unwind information from the compact-unwind section for x86-64 and i386 on Darwin. r223625.

LLVM Weekly - #49, Dec 8th 2014

Mon, 08 Dec 2014 08:14:00 +0000

Welcome to the forty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Most of the 2014 LLVM Developers' Meeting videos and slides are now online. Sadly, there are no videos from the talks by Apple employees yet. Hopefully they'll be appearing later.

QuarksLab has a rather nice write-up of deobfuscating an OLLVM-protected program.

The LLVM-based ELLCC has been making progress on ELK, a bare-metal POSIX-like environment.

Support for statepoints landed in LLVM this week, and Philip Reames has a blog post detailing some notes and caveats. See also the mailing list discussion linked to below about future plans for GC in LLVM.

On the mailing lists

Sami Liedes shares his workflow for using afl-fuzz with the Clang test suite. In 11 hours of testing he managed to find 34 distinct assertion failures and at least one segmentation fault.
Duncan P.N. Exon Smith has shared an updated on his work on the metadata-value split, detailing the new semantic restrictions this will entail.
Philip Reames has a post detailing his future plans for GC in LLVM. Comments are invited. The aim is to eventually delete the existing gcroot lowering code. If you are actively using this, please do speak up.
John Yates, who worked on the compiler for the Apollo Computer's DN10K has shared a description of how that compiler would have handled one of the recent examples from the Souper work.
How can you reproduce Clang's -O3 using opt? The answer, thanks for Tobias Grosser, is clang -O3 -mllvm -disable-llvm-optzns followed by opt -O3.
Tobias Grosser is seeking community feedback on where in the pipeline the Polly loop optimiser should run. The post is well worth a read for the discussion of expected trade-offs.
Rafael EspÃndola has been working on type merging during LTO and ultimately proposes moving to a single pointer type in LLVM IR. There seems to be positive feedback on the idea, given that pointer types don't convey useful information to the optimizer and don't really provide safety.

LLVM commits

The statepoint infrastructure for garbage collection has landed. See the final patch in the series for documentation. r223078, r223085, r223137, r223143.
The LLVM assembler gained support for ARM's funky modified-immediate assembly syntax. r223113.
The OCaml bindings now has a CMake buildsystem. r223071.
The PowerPC backend gained support for readcyclecounter on PPC32. r223161.
Support for 'prologue' metadata on functions has been added. This can be used for inserting arbitrary code at a function entrypoint. This was previously known as prefix data, and that term has been recycled to be used for inserting data just before the function entrypoint. r223189.
PowerPC gained a Power8 instruction schedule definition r223257.

Clang commits

LLVM IR for vtable addresses now uses the type of the field being pointed to, to enable more optimisations. r223267.
New attributes have been added to specify AMDGPU register limits. This is a performance hint that can be used to attempt to limit the number of used registers. r223384.
Clang gained the __has_declspec_attribute preprocessor macro. r223467.
__has_attribute now only looks for GNU-style attributes. You should be able to use __has_cpp_atribute or __has_declspec_attribute instead. r223468.

Other project commits

DataFlowSanitizer is now supported for MIPS64. r223517.
libcxx now supported std::random_device on (P)NaCl. r223068.
An effort has started in lld to reduce abstraction around InputGraph, which has been found to get in the way of new features due to excessive information hiding. r223330. The commit has been temporarily reverted due to breakage on Darwin and ELF.
A large chunk of necessary code for Clang module support has been added to LLDB. r223433.
LLDB now has documented coding conventions. r223543.

LLVM Weekly - #48, Dec 1st 2014

Mon, 01 Dec 2014 06:57:00 +0000

Welcome to the forty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

John Regehr has posted an update on the Souper superoptimizer which he and his collaborators have been working on. They have implemented a reducer for Souper optimizations that tries to reduce the optimization to something more minimal. There current results given ~4000 distinct optimisations of which ~1500 LLVM doesn't know how to do. Of course many of these may in fact be covered by a single rule or pass. One of the next steps for Souper is to extend Souper to support the synthesis of instruction sequences. See also the discussion on the llvm mailing list.

The LLVM Blog features a summary of recent advances in loop vectorization for LLVM. This includes diagnostics remarks to get feedback on why loops which aren't vectorized are skipped, the loop pragma directive in Clang, and performance warnings when the directive can't be followed.

The LLVM Haskell Compiler (LHC) has been newly reborn along with its blog. The next steps in development are to provide better support for Haskell2010, give reusable libraries for name resolution and type checking, and to produce human-readable compiler output.

The next LLVM Social in Paris will take place on December 9th.

Intel have published a blog post detailing new X86-specific optimisations in GCC 5.0. You may also be interested in the discussion of this post on Hacker News.

On the mailing lists

Hal Finkel has posted an RFC suggesting the removal of the BBVectorize pass on the basis that it hasn't progressed to production quality while the SLP vectorizer exists and has been enabled for some time and it has various bugs and code fixmes. If you feel differently, now is the time to speak up.
Yichao Yu is curious about the current state of MCJIT for ARM. Several people responded to say they've been using it with few problems on ARM, though Renato Golin would like to see a few more success stories before marking it as 'supported' on the appropriate status page.
Tom Stellard is planning to start the 3.5.1 release cycle shortly. Let him know if you'd like to help with testing.
When developing a non-upstreamed LLVM backend, should you do it as a loadable module or just apply to a cloned LLVM repo? Rafael Auler has tried the approach of building his backend as a loadable module and feels it would have been better to fork LLVM and rebase when necessary. This is the approach your esteemed editor has taken (though admittedly it's been far too long since he rebased...).

LLVM commits

Support for -debug-ir (emitting the LLVM IR in debug data) was removed. There's no real justification or explanation in the commit message, but it's likely it was unfinished/unused/non-functional. r222945.
InstCombine will now canonicalize toward the value type being stored rather than the pointer type. The rationale (explained in more detail in the commit message) is that memory does not have a type, but operations and the values they produce do. r222748.
The documentation for !invariant.load metadata has been clarified. r222700.
In tablegen, neverHasSideEffects=1 is now hasSideEffects=0. r222801.

Clang commits

Four new ASTMatchers have been added: typedefDecl, isInMainFile, isInSystemFile, and isInFileMatchinName. r222646.
The documentation on MSVC compatibility has been updated to represent the current state of affairs. Clang has also gained support for rethrowing MS C++ exceptions. r222731, r222733.

Other project commits

Initial tests have been added for lldb-mi (the LLDB machine interface). r222750.
libcxxabi can now be built and tested without threads using CMake. r222702.
The compact-unwind-dumper tool now has complete support for x86-64 and i386 binaries. r222951.

Loop Vectorization: Diagnostics and Control

Mon, 24 Nov 2014 15:12:00 +0000

Loop vectorization was first introduced in LLVM 3.2 and turned on by default in LLVM 3.3. It has been discussed previously on this blog in 2012 and 2013, as well as at FOSDEM 2014, and at Apple's WWDC 2013. The LLVM loop vectorizer combines multiple iterations of a loop to improve performance. Modern processors can exploit the independence of the interleaved instructions using advanced hardware features, such as multiple execution units and out-of-order execution, to improve performance.

Unfortunately, when loop vectorization is not possible or profitable the loop is silently skipped. This is a problem for many applications that rely on the performance vectorization provides. Recent updates to LLVM provide command line arguments to help diagnose vectorization issues and new a pragma syntax for tuning loop vectorization, interleaving, and unrolling.

New Feature: Diagnostics Remarks

Diagnostic remarks provide the user with an insight into the behavior of the behavior of LLVMâ€™s optimization passes including unrolling, interleaving, and vectorization. They are enabled using the Rpass command line arguments. Interleaving and vectorization diagnostic remarks are produced by specifying the â€˜loop-vectorizeâ€™ pass. For example, specifying â€˜-Rpass=loop-vectorizeâ€™ tells us the following loop was vectorized by 4 and interleaved by 2.

void test1(int *List, int Length) {

int i = 0;

while(i < Length) {

List[i] = i*2;

i++;

}

clang -O3 -Rpass=loop-vectorize -S test1.c -o /dev/null

test1.c:4:5: remark:

vectorized loop (vectorization factor: 4, unrolling interleave factor: 2)

while(i < Length) {

Many loops cannot be vectorized including loops with complicated control flow, unvectorizable types, and unvectorizable calls. For example, to prove it is safe to vectorize the following loop we must prove that array â€˜Aâ€™ is not an alias of array â€˜Bâ€™. However, the bounds of array â€˜Aâ€™ cannot be identified.

void test2(int *A, int *B, int Length) {

for (int i = 0; i < Length; i++)

A[B[i]]++;

}

clang -O3 -Rpass-analysis=loop-vectorize -S test2.c -o /dev/null

test2.c:3:5: remark:

loop not vectorized: cannot identify array bounds

for (int i = 0; i < Length; i++)

Control flow and other unvectorizable statements are reported by the '-Rpass-analysis' command line argument. For example, many uses of â€˜breakâ€™ and â€˜switchâ€™ are not vectorizable.

C/C++ Code	-Rpass-analysis=loop-vectorize
for (int i = 0; i < Length; i++) { if (A[i] > 10.0) break; A[i] = 0; }	control_flow.cpp:5:9: remark: loop not vectorized: loop control flow is not understood by vectorizer if (A[i] > 10.0) ^
for (int i = 0; i < Length; i++) { switch(A[i]) { case 0: B[i] = 1; break; case 1: B[i] = 2; break; default: B[i] = 3; } }	no_switch.cpp:4:5: remark: loop not vectorized: loop contains a switch statement switch(A[i]) { ^

New Feature: Loop Pragma Directive

Explicitly control over the behavior of vectorization, interleaving and unrolling is necessary to fine tune the performance. For example, when compiling for size (-Os) it's a good idea to vectorize the hot loops of the application to improve performance. Vectorization, interleaving, and unrolling can be explicitly specified using the #pragma clang loop directive prior to any for, while, do-while, or c++11 range-based for loop. For example, the vectorization width and interleaving count is explicitly specified for the following loop using the loop pragma directive.

void test3(float *Vx, float *Vy, float *Ux, float *Uy, float *P, int Length) {

#pragma clang loop vectorize_width(4) interleave_count(4)

#pragma clang loop unroll(disable)

for (int i = 0; i < Length; i++) {

float A = Vx[i] * Ux[i];

float B = A + Vy[i] * Uy[i];

P[i] = B;

}

clang -O3 -Rpass=loop-vectorize -S test3.c -o /dev/null

test3.c:5:5: remark:

vectorized loop (vectorization factor: 4, unrolling interleave factor: 4)

for (int i = 0; i < Length; i++) {

Integer Constant Expressions

The options vectorize_width, interleave_count, and unroll_count take an integer constant expression. So it can be computed as in the example below.

template <int ArchWidth, int ExecutionUnits>

void test4(float *Vx, float *Vy, float *Ux, float *Uy, float *P, int Length) {

#pragma clang loop vectorize_width(ArchWidth)

#pragma clang loop interleave_count(ExecutionUnits * 4)

for (int i = 0; i < Length; i++) {

float A = Vx[i] * Ux[i];

float B = A + Vy[i] * Uy[i];

P[i] = B;

}

void compute_test4(float *Vx, float *Vy, float *Ux, float *Uy, float *P, int Length) {

const int arch_width = 4;

const int exec_units = 2;

test4<arch_width, exec_units>(Vx, Vy, Ux, Uy, P, Length);

}

clang -O3 -Rpass=loop-vectorize -S test4.cpp -o /dev/null

test4.cpp:6:5: remark:

vectorized loop (vectorization factor: 4, unrolling interleave factor: 8)

for (int i = 0; i < Length; i++) {

Performance Warnings

Sometimes the loop transformation is not safe to perform. For example, vectorization fails due to the use of complex control flow. If vectorization is explicitly specified a warning message is produced to alert the programmer that the directive cannot be followed. For example, the following function which returns the last positive value in the loop, cannot be vectorized because the â€˜last_positive_valueâ€™ variable is used outside the loop.

int test5(int *List, int Length) {

int last_positive_index = 0;

#pragma clang loop vectorize(enable)

for (int i = 1; i < Length; i++) {

if (List[i] > 0) {

last_positive_index = i;

continue;

}

List[i] = 0;

}

return last_positive_index;

}

clang -O3 -g -S test5.c -o /dev/null

test5.c:5:9: warning:

loop not vectorized: failed explicitly specified loop vectorization

for (int i = 1; i < Length; i++) {

The debug option â€˜-gâ€™ allows the source line to be provided with the warning.

Conclusion

Diagnostic remarks and the loop pragma directive are two new features that are useful for feedback-directed-performance tuning. Special thanks to all of the people who contributed to the development of these features. Future work includes adding diagnostic remarks to the SLP vectorizer and an additional option for the loop pragma directive to declare the memory operations as safe to vectorize. Additional ideas for improvements are welcome.

LLVM Weekly - #47, Nov 24th 2014

Mon, 24 Nov 2014 06:00:00 +0000

Welcome to the forty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Version 3.0 of the Capstone disassembly framework has been released. Python bindings have been updated to support Python 3, and this release also adds support for Sparc, SystemZ and XCore. It also has performance improvements.

Herb Sutter has penned a trip report of the recent ISO C++ meeting.

Emscripten has updated to use LLVM 3.4 from the PNaCl team. There's more work to be done to rebase on top of 3.5.

Woboq has written a blog post detailing C++14 features of interest to Qt programmers, though I suspect the article has a wider potential audience than that. Recent Clang of course has good support for the new C++14 features.

There is going to be an LLVM Devroom at FOSDEM 2015, and the submission deadline for presentations/talks/tutorials is on Dec 1st.

Apple's LLVM Source Tools and Program Analysis teams are looking for interns for Summer 2015.

On the mailing lists

If you're wondering how the process of adding OpenMP support to Clang is going, the answer is that it's still ongoing and there's hope it will be done by the 3.6 release, depending on the speed of code reviews.
Siva Chandra kicked off a discussion on the mailing list about how to better manage breakages caused by LLVM or Clang API changes. Siva suggests LLDB should be developed against a known-good version of LLVM/Clang that gets periodically bumped. Vince Harron says that he is looking to add a continuous build on curated versions of Clang/LLVM in addition to a continuous build on top of tree for everything. This should help improve the signal to noise ratio and make it easier for LLDB developers to tell when a breaking change is due to their addition or a change elsewhere. Reid Kleckner suggests lldb should be treated part of the same project as Clang/LLDB and more pressure should be put on developers to fix breakages, presumably in the same way that API changes in LLVM almost always come with an associated patch to fix Clang.
Peter Collingbourne has proposed adding the llgo frontend to the LLVM project. Chris Lattner is in favour of this, but would like to see the GPLv3+runtime exception dependencies rewritten before being checked in. Some people in the thread expressed concern that the existing base of LLVM/Clang reviewers know C++ and may not be able to review patches in Go, though it looks like a non-zero of existing LLVM reviewers are appropriately multilingual.
Brett Simmers is working on HHVM and is interested if there are ways to control where a BasicBlock ends up in memory, with the motivation to make best of the instruction cache by keeping frequently executed pieces of code closer together. There's general agreement this would be a great feature to have, but it doesn't sound like this is easily supported in LLVM right now.

LLVM commits

A small doc fix has the honour of being commit 222222.
A nice little optimisation has been committed which replaces a switch table with a mul and add if there is a linear mapping between index and output. r222121.
The SeparateConstOffsetFromGEP, EarlyCSE, and LICM passes have been enabled on AArch64. This has measurable gains for some SPEC benchmarks. r222331.
The description of the noalias attribute has been clarified. r222497.
MDNode is being split into two classes, GenericMDNode and MDNodeFwdDecl. r222205.
The LLVM CMake-based build system learned to support LLVM_USE_SANITIZER=Thread. r222258.
The R600 backend gained the SIFoldOperands pass which attempts to fold source operands of mov and copy instructions into their uses. r222581.

Clang commits

Clang now distinguishes between -fpic and -fPIC. r222227.
The -Wuninitialized warning will now trigger when accessing an uninitialized base class in a constructor. r222503.

Other project commits

LLDB can now perform basic debugging operations on Windows. r222474.
LLDB's line editing support was been completely rewritten. r222163.
MemorySanitizer gained support for MIPS64. r222388.
A sample tool was added to lldb to extract and dump unwind information from Darwin's compact unwind section. r222127.

LLVM Weekly - #46, Nov 17th 2014

Tue, 18 Nov 2014 05:43:00 +0000

Welcome to the forty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Chrome on Linux now uses Clang for production builds. Clang has of course been used on OS X Chrome for quite some time. The switch saw reduction in binary size of ~8%, but this was vs GCC 4.6 rather than something more up-to-date.

The LLVM in HPC workshop at SC14 is taking place on Monday and the full agenda with abstracts is available online

On the mailing lists

Duncan P.N. Exon Smith has posted an RFC on splitting out metadata from the Value hierarchy. There seems to be general support for the idea. If you have concerns, now is the time to speak up.
Tom Stellard has posted a proposed LLVM/Clang 3.5.1 release schedule. RC1 is currently planned for November 26th.
Zachary Turner raised the issue referencing Apple rdar bugs in commit messages. The concern is that sometimes the commit messages are hard to work out without the context of the bug, which many of us do not have access to.

LLVM commits

Work on call lowering for MIPS FastISel has started. r221948.
Work has started on an assembler for the R600 backend. r221994.
A pass implementing forward control-flow integrity as been added. r221708.
A whole slew of patches that made MDNode a Value have been reverted due to a change in plan. The aim is now to separate metadata from the Value hierarchy. r221711.
There are two ways to inform the optimizer the result of a load is never null. Either with metadata or via assume. The latter is now canonicalized into the former. r221737.
vec_vsx_ld and vec_vsx_st intrinsics have been added for PowerPC. r221767.
PowerPC gained support for small-model PIC. r221791.
The llvm.arm.space intrinsic was added to make it easier to write tests for ARM ConstantIslands. r221903.

Clang commits

The constant trickle of OpenMP patches continues. Codegen for threadprivate variables has been added. r221663.
Support for __has_cpp_attribute is now present. r221991.

Other project commits

Breakpoint stop/resume has been implemented on Windows for LLDB. r221642.
The libcxx status page has been updated with the current state of C++1z support. r221601).

LLVM Weekly - #45, Nov 10th 2014

Mon, 10 Nov 2014 05:11:00 +0000

Welcome to the forty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Adrian Sampson has posted a status update on his Quala project to add custom type annotations to C and C++ in Clang/LLVM.

Bruce Mitchener has posted to the Dylan blog describing how Dylan integrates with LLVM. Interestingly, Dylan doesn't link with the LLVM libraries and instead generates bitcode files directly.

The Numba project has released llvmlite, lightweight python bindings to LLVM for writing JIT compilers. This was developed based on experience using the old llvmpy bindings.

Obfuscator-LLVM has been updated to work with LLVM 3.5.

On the mailing lists

Arnaud A. de Grandmaison kicked off a discussion on the semantics of lifetime.start and lifetime.end intrinsics. Right now, if lifetime intrinsics are enabled for smaller objects a self-hosted build is broken. The question is whether this is due to a misunderstanding of the lifetime spec or just a hidden bug. Reid Kleckner suggests different intrinsics for the simple case of stack allocated data. He also clarifies what he means by stack colouring. This is followed by some in-depth back-and-forth discussion on the validity of transformations involving lifetime.start/lifetime.end and whether new intrinsics are required.
James Molloy has been experimenting with the scheduling model on the Cortex-A57 and found some oddities. I noted the MicroOpBufferSize is currently set to 128, and reducing it right down to 2 seems to have no effect. Andrew Trick responded with some suggetions on implementing a custom scheduling strategy.
Volodymyr Kuznetsov and his collaborators are asking for feedback on their patchset to implement their recently published work on control flow hijacking protection. The OSDI paper is available here. The current patchset covers the stack protection aspect of the paper, providing stronger protection than stack cookies at a lower overhead.
FrÃ©dÃ©ric Riss is interested in reimplementing Darwin's dsymutil as an lld helper. dsymutil is a standalone DWARF linker which is used to load merge and optimize dwarf debug info and write it out to a .dSYM file.

LLVM commits

The PBQP register allocator has had its spill costs and coalescing benefits tweaked. This apparently results in a few percent improvement on benchmarks such as EEMBC and SPEC. r221292, r221293.
The new SymbolRewriter pass is an IR to IR transformation allowing adjustment of symbols during compilation. It is intended to be used for symbol interpositioning in sanitizers and performance analysis tools. r221548.
Hexagon gained a basic ELF object emitter. r221465.
llvm-vtabledump gained support for the Itanium ABI. r221133.
LLVM's CMake build system gained the LLVM_BUILD_STATIC option. r221345.
The usage of Inputs/ for extra test files has been documented. r221406.
The MIPS backend has reached a milestone in support for the N32/N64 ABI. This commit fixes all known bugs for this ABI and the first 10000 tests generated by ABITest.py pass. r221534.

Clang commits

clang-format gained various improvements for formatting Java code. r221104, r221109, and others.
Support was added for C++1z nested namespace definitions, u8 character literals, and attributes on namespaces or enumerators. r221574, r221576, r221580.

Other project commits

LLD learned how to parse most linker scripts. Before getting too excited, do note this is parsing only, semantic actions will come in the future. r221126.
The common Sanitizer code gained a generic stack frame renderer. This allows the user to control the format of stack frame output. r221409, r221469.
The basic framework for live debugging on Windows was added to LLDB. It will detect changes such as DLL loads and unloads etc, but these need to be propagated through LLDB properly. r221207.
lldb-gdbserver now supports the Android target. r221570.

LLVM Weekly - #44, Nov 3rd 2014

Mon, 03 Nov 2014 03:29:00 +0000

Welcome to the forty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The 2014 LLVM Dev meeting was held last week. I couldn't make it, but it seems like there was a great selection of talks. Sadly the keynote about Swift's high-level IR was cancelled. No word yet on when we can expect slides and videos online. However, slides by Philip Reames and Sanjoy Das from their talk on on implementing fully relocating garbage collection in LLVM are online.

Peter Zotov has been doing lots of work on the LLVM OCaml bindings recently, and is looking for additional help. Recently, he's closed almost all open bugs for the bindings, migrated them to ocamlfind, fixed Lllvm_executionengine, and ensured pretty much the whole LLM-C API is exposed. Tasks on the todo list include writing tests in OUnit2 format, migrating the Kaleidoscope tutorial off camlp4, and splitting up and adding OCaml bindings to this patch. More ambitiously, it would be interesting to writing LLVM passes in OCaml and to represent LLVM IR as pure AST. If any of this interests you, do get in touch with Peter. He's able to review any patches, but could do with help on working through this list of new features.

The LLVM Bay Area monthly social is going to be held on 6th November.

On the mailing lists

Reid Kleckner has proposed dropping support for running LLVM on Windows XP. This would allow the use of system APIs only available in Vista and above. Thus far all responses have been positive, with one even suggesting raising the minimum to Windows 7.
Tom Stellard suggests deprecating the autoconf build system. Right now there is both an autotools based system and a CMake system, though CMake seems most used by developers for LLVM at least. Bob Wilson points out that the effort required to keep the existing makefiles working is much less than what might be needed to update the CMake build to support all uses cases. Though other replies make it seems that the CMake build supports pretty much all configurations people use now. If there are people who actually enjoy fiddling with build systems (far-fetched, I know), it seems like a little effort could go a long way and allow the makefile system to be jettisoned.
Betul Buyukkurt has posted an RFC on indirect call target profiling. The goal is to use the collected data for optimisation. Kostya Serebryany described how it can be used to provide feedback to fuzzers and detailed properties that would be useful for this usecase.
Chris Matthews announces that a new Jenkins-based OSX build cluster is up and running. This includes multiple build profiles and an O3 LTO performance tracker. The Jenkins config should be committed to zorg soon.

LLVM commits

Support for writing sampling profiles has been committed. In the future, support to read (and maybe write) profiles in GCC's gcov format will be added, and llvm-profdata will get support to manipulate sampling profiles. r220915.
A comment has been added to X86AsmInstrumentation to describe how asm instrumentation works. r220670.
The Microsoft vectorcall calling convention has been implemented for x86 and x86-64. r220745.
The C (and OCaml) APIs gained functions to query and modify branches, and to obtain the values for floating point constants. There have been a whole bunch of additional commits related to the OCaml bindings, too many to pick out anything representative. r220814, r220815, r220817, r220818.
The loop and SLP (superword level parallelism) vectorizers are now enabled in the Gold plugin. r220886, r220887.

Clang commits

A refactoring of libTooling to reduce required dependencies means that clang-format's binary is now roughly half the size. r220867.

Other project commits

lldb has started to adopt the StringPrinter API. r220894.
Initial support for PowerPC/PowerPC64 on FreeBSD has been added to LLDB. r220944.

LLVM Weekly - #43, Oct 27th 2014

Mon, 27 Oct 2014 04:02:00 +0000

Welcome to the forty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

This week it's the LLVM Developers' Meeting in San Jose. Check out the schedule. Unfortunately I won't be there, so I'm looking forward to the slides and videos going online.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Philip Reames has written up a detailed discussion of statepoints vs gcroot for representing call safepoints. The aim is to clearly explain how the safepoint functionality provided by the patches currently up for review differ to the current gc.root support.

The Haskell community have put together a proposal for an improved LLVM backend to GHC. They intend to ship GHC with its own local LLVM build.

CoderGears have published a blog post about using Clang to get better warnings in Visual C++ projects.

There is going to be a dedicated LLVM devroom at FOSDEM 2015. Here is the call for speakers and participation.

On the mailing lists

Elena Demikhovsky has asked for comments on a proposal to add masked vector load and store intrinsics. Essentially all feedback so far is positive on the idea.
Renato Golin proposes moving libunwind into compiler-rt. One of the subtleties is hat libunwind isn't fully compatible with GCC's unwind implementation (due to different data structure layouts), which means they can't be mixed.
Kristof Beyls has posted some notes in preparation for the benchmarking infrastructure BoF at the LLVM dev meeting.

LLVM commits

The nonnull metadata has been introduced for Load instructions. r220240.
minnum and maxnum intrinsics have been added. r220341, r220342.
The Hexagon backend gained a basic disassembler. r220393.
PassConfig gained usingDefaultRegAlloc to tell if the default register allocator is being used. r220321.
An llvm-go tool has been added. It is intended to be used to build components such as the Go frontend in-tree. r220462.

Clang commits

C compilation defaults to C11 by default, matching the behaviour of GCC 5.0. r220244.
Clang should now be better at finding Visual Studio in non-standard setups. r220226.
The Windows toolchain is now known as MSVCToolChain, to allow the addition a CrossWindowsToolChain which will use clang/libc++/lld. r220362, r220546.

Other project commits

The libcxxabi gained support for running libc++abi tests with sanitizers. r220464.

LLVM Weekly - #42, Oct 20th 2014

Mon, 20 Oct 2014 04:31:00 +0000

Welcome to the forty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

If you're local to London, you may be interested to know that I'll be talking about lowRISC at the Open Source Hardware User Group on Thursday.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

ELLCC, the LLVM-based cross-compilation toolchain now has pre-built binaries for all LLVM tools.

Eli Bendersky's repository of examples for using LLVM and Clang as libraries and for building new passes aren't new, but they are incredibly useful for newcomers to LLVM/Clang and I haven't featured them before. If you want to build something using LLVM or Clang, the llvm-clang-samples repos is one of the best places to start.

On the mailing lists

If you enjoy bikeshedding, I have the perfect thread for you. Should LLVM change its naming convention for variables? There actually seems to be a lot of consensus that the current approach of using capitalized variable names is weird.
Richard Smith has proposed switching the default C language mode from gnu99 to gnu11. GNU trunk has just switched from gnu89 by default to gnu11. There seems to be almost universal support for gnu11 by default.
Junio Cezar writes to the mailing list to share his experiments on time taken in various LLVM passes. His webpage has plots of time taken in each stage for csmith-generated programs. Hal Finkel had some suggestions on improving the analysis.
Bill Wendling is stepping down as LLVM release manager. He nominated Tom Stellard and Hans Wennborg as his replacements, who have been accepted by unanimous agreement.
Chandler Carruth suggests making DataLayout non-optional.

LLVM commits

Go LLVM bindings have been committed. r219976.
Invoking patchpoint intrinsics is now supported. r220055.
LLVM gained a workaround for a Cortex-A53 erratum. r219603.
Basic support for ARM Cortex-A17 was added. r219606.
The C API has been extended with the LLVMWriteBitcodeToMemoryBuffer function. r219643.
NumOperands has been moved from User to Value. On 64-bit host architectures this reduces sizeof(User) and subclasses by 8. r219845.
The LLVMParseCommandLineOptions was added to the C API. r219975.

Clang commits

Constant expressions can now be used in pragma loop hints. r219589.
The libclang API gained a function to retrieve the storage class of a declaration. r219809.
With the -fsanitize-address-field-padding flag, Clang can insert poisoned paddings between fields in C++ classes to allow AddressSanitizer to find intra-object overflow bugs. r219961.

Other project commits

lldb now supports a gdb-style batch mode. r219654.

LLVM Weekly - #41, Oct 13th 2014

Tue, 14 Oct 2014 03:59:00 +0000

Welcome to the forty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I've been in Munich for ORCONF this weekend. Slides from my talk about lowRISC are available here.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

ELLCC, the LLVM/Clang-based cross development toolkit now has Windows binaries available.

IBM have posted a bounty on fixing the AddressSanitizer tests that fail on PowerPC.

GCC needs you! A large number of potential starting points for new contributors has been posted to the GCC mailing list.

On the mailing lists

Hayden Livingston is curious about examples of LLVM usage for whole program optimisations, mentioning LLVM JIT functionality and GC as areas of interest. Philip Reames responded with a good description of the current state and also noted his patchset for GC statepoint intrinsics is up for review and should hopefully be merged in the coming weeks. Filip Pizlo who worked on Apple's FTL JS JIT responded to advocate use of a Bartlett-style mostly-copying collector.
Peter Collingbourne has proposed official Go bindings be added to the LLVM project. Thus far all replies seem positive.
Saleem Abdulrasool points out that lld doesn't conform to the LLVM/Clang coding style. As you can imagine, few topics attract more feedback from developers than whitespace and variable naming conventions so the thread is rather long. There's general agreement that it would be better if lld used the LLVM style, though unease about moving over in a single large patch on the basis that this would dirty commit history and make git/svn blame less useful. A patch was submitted to git some years ago to implement the ability to ignore certain shas in git blame but it seems the feature was never added.

LLVM commits

Switches with only two cases and a default are now optimised to a couple of selects. r219223.
llvm-symbolizer will now be used to symbolize LLVM/Clang crash dumps. r219534.
The calculation of loop trip counts for loops with multiple exits has been de-pessimized. r219517.
MIPS fast-isel learnt integer and floating point compare and conditional branches. r219518, r219530, r219556.
R600 gained a load/store machine optimizer pass. r219533.

Clang commits

The integrated assembler has been turned on by default for ppc64 and ppc64le. r219129.
clang-format's interpretation of special comments to disable formatting within a delimited range has been documented. r219204.
The integrated assembler has been turned on by default for SystemZ. r219426.

Other project commits

lld gained support for 'fat' mach-o archives. r219268.
The lldbtk example has seen some further development. r219219.
lldb-gdbserver can now be used for local-process Linux debugging. r219457.
The disassembly format for lldb can now be customized. r219544.

LLVM Weekly - #40, Oct 6th 2014

Mon, 06 Oct 2014 05:49:00 +0000

Welcome to the fortieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'll be in Munich next weekend for the OpenRISC conference where I'll be presenting on the lowRISC project to produce an open-source SoC. I'll be giving a similar talk in London at the Open Source Hardware User Group on 23rd October.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Capstone 3.0 RC1 has been released Capstone is an open source disassembly engine, based initially on code from LLVM. This release features support for Sparc, SystemZ and XCore as well as the previously supported architectures. Among other changes, the Python bindings are now compatible with Python 3.

An interesting paper from last year came up on the mailing list. From EPFL, it proposes adding -OVERIFY to optimise programs for fast verification. The performance of symbolic execution tools is improved by reducing the number of paths to explore and the complexity of branch conditions. They managed a maximum 95x reduction in total compilation and analysis time.

The next Cambridge (UK) social will take place on Wed 8th Oct at 7.30 pm.

On the mailing lists

Reid Kleckner has posted an RFC on approaches to representing structured exception handling (SEH) in LLVM IR. This is the exception handling model used on Windows.
Chandler Carruth has written to the list to announce his new x86 vector shuffle lowering path is now enabled by default. This code path has seen extensive fuzz testing. The performance improvement is largest on AMD chips with older SSE versions. If anyone is able to find a performance regression, you are encouraged to report it.
Richard Pennington who maintains the Clang/LLVM ELLCC cross-development toolchain is considering dropping support for Microblaze. The Microblaze backend was dropped from LLVM last year, but Richard has been maintaining it out of tree. However there seems to be very little actual interest. If somebody wants to pick it up, now is the time to jump in.

LLVM commits

The expansion of atomic loads/stores for PowerPC has been improved. r218922. The documentation on atomics has also been updated. r218937.
For the past few weeks, Chandler Carruth has been working on a new vector shuffle lowering implementation. There have been too many commits to summarise, but the time has come and the new codepath is now enabled by default. It claims 5-40% improvements in the right conditions (when the loop vectorizer fires in the hot path for SSE2/SSE3). r219046.
The Cortex-A57 scheduling model has been refined. r218627.
SimplifyCFG now has a configurable threshold for folding branches with common destination. Changing this threshold can be worthwhile for GPU programs where branches are expensive. r218711.
Basic support for the newly-announced Cortex-M7 has been added. r218747.
As discussed on the mailing list last week, the sqrt intrinsic will now return undef when given a negative input. r218803.
llvm-readobj learnt -coff-imports which will print out the COFF import table. r218891, r218915.

Clang commits

Support for the align_value attribute has been added, matching the behaviour of the attribute in the Intel compiler. The commit message explains why this attribute is useful in addition to aligned. r218910.
A rather useful diagnostic has been added. -Winconsistent-missing-override will warn if override is missing on an overridden method if that class has at least one override specified on its methods. r218925.
Support for MS ABI continues. thread_local is now supported for global variables. r219074.
Matcher and DynTypedMatcher saw some nice performance tweaking, resulting in a 14% improvement on a clang-tidy benchmark and compilation of Dynamic/Registry.cpp sped up by 17%. r218616.
lifetime.start and lifetime.end markers are now emitted for unnamed temporary objects. r218865.
The __sync_fetch_and_nand intrinsic was re-added. See the commit message for a history of its removal. r218905.
Clang gained its own implementation of C11 stdatomic.h. The system header will be used in preference if present. r218957.
Clang now understands -mthread-model to specify the thread model to use, e.g. posix, single (for bare-metal and single-threaded targets). r219027.

Other project commits

libcxxabi should now work with the ARM Cortex-M0. r218869.
lldb gained initial support for scripting stepping. This is the ability to add new stepping modes implemented by python classes. The example in the follow-on commit has a large comment at the head of the file to explain its operation. r218642, r218650.

The LLVM Project Blog

GSoC 2024: Out-Of-Process Execution For Clang-Repl

Project Background

What We Accomplished

Out-Of-Process Execution Support for Clang-Repl

Issues Encountered

ORC JIT Enhancements

Additional Improvements

Benchmarks: In-Process vs Out-of-Process Execution

Result

Future Work

Conclusion

Acknowledgements

Related Links

GSoC 2024: The 1001 thresholds in LLVM

Background

What We Did

Results

Future Work

Acknowledgements

Links

GSoC 2024: 3-way comparison intrinsics

Background

What was done

Results

Future Work

Acknowledgements

GSoC 2024: ABI Lowering in ClangIR

Goals

Contributions

Target Lowering Library

Calling Convention Lowering Pass

Shortcomings

Target-Specific Lowering Unification

Inclusion in the Main Pipeline

Future Work

Acknowledgements

GSoC 2024: Statistical Analysis of LLVM-IR Compilation

Background

Summary of Work

Current Status

Future Work

Acknowledgements

Links

GSoC 2024: Reviving NewGVN

Background

Implementing PRE

Missing Features

Results

Future Work

GSoC 2024: Compile GPU kernels using ClangIR

Background

What We Did

Results

Future Works

Acknowledgements

Appendix

GSoC 2024: Half-precision in LLVM libc

Work done

Work left to do

Acknowledgements

GSoC 2024: GPU Libc Benchmarking

Background

What We Did

Results

Future Work

Acknowledgements

Links

LLVM Google Summer of Code 2024 & 2023

GSoC 2024

1. Project ideas

2. Way to submitting a proposal

3. Useful links

4. Deadlines

Another step forward towards interactive programming

Yuquan Fu - Autocompletion in Clang-REPL

Example â€“ avoiding tedious typing

Anubhab Ghosh - WebAssembly Support for Clang-Repl

Example:

Sunho Kim - Re-optimization using JITLink

`alignas` specifier

`-Wformat`