169 lines
		
	
	
		
			5.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			169 lines
		
	
	
		
			5.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. SPDX-License-Identifier: GPL-2.0
 | |
| 
 | |
| ===================================
 | |
| Using AutoFDO with the Linux kernel
 | |
| ===================================
 | |
| 
 | |
| This enables AutoFDO build support for the kernel when using
 | |
| the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
 | |
| is a type of profile-guided optimization (PGO) used to enhance the
 | |
| performance of binary executables. It gathers information about the
 | |
| frequency of execution of various code paths within a binary using
 | |
| hardware sampling. This data is then used to guide the compiler's
 | |
| optimization decisions, resulting in a more efficient binary. AutoFDO
 | |
| is a powerful optimization technique, and data indicates that it can
 | |
| significantly improve kernel performance. It's especially beneficial
 | |
| for workloads affected by front-end stalls.
 | |
| 
 | |
| For AutoFDO builds, unlike non-FDO builds, the user must supply a
 | |
| profile. Acquiring an AutoFDO profile can be done in several ways.
 | |
| AutoFDO profiles are created by converting hardware sampling using
 | |
| the "perf" tool. It is crucial that the workload used to create these
 | |
| perf files is representative; they must exhibit runtime
 | |
| characteristics similar to the workloads that are intended to be
 | |
| optimized. Failure to do so will result in the compiler optimizing
 | |
| for the wrong objective.
 | |
| 
 | |
| The AutoFDO profile often encapsulates the program's behavior. If the
 | |
| performance-critical codes are architecture-independent, the profile
 | |
| can be applied across platforms to achieve performance gains. For
 | |
| instance, using the profile generated on Intel architecture to build
 | |
| a kernel for AMD architecture can also yield performance improvements.
 | |
| 
 | |
| There are two methods for acquiring a representative profile:
 | |
| (1) Sample real workloads using a production environment.
 | |
| (2) Generate the profile using a representative load test.
 | |
| When enabling the AutoFDO build configuration without providing an
 | |
| AutoFDO profile, the compiler only modifies the dwarf information in
 | |
| the kernel without impacting runtime performance. It's advisable to
 | |
| use a kernel binary built with the same AutoFDO configuration to
 | |
| collect the perf profile. While it's possible to use a kernel built
 | |
| with different options, it may result in inferior performance.
 | |
| 
 | |
| One can collect profiles using AutoFDO build for the previous kernel.
 | |
| AutoFDO employs relative line numbers to match the profiles, offering
 | |
| some tolerance for source changes. This mode is commonly used in a
 | |
| production environment for profile collection.
 | |
| 
 | |
| In a profile collection based on a load test, the AutoFDO collection
 | |
| process consists of the following steps:
 | |
| 
 | |
| #. Initial build: The kernel is built with AutoFDO options
 | |
|    without a profile.
 | |
| 
 | |
| #. Profiling: The above kernel is then run with a representative
 | |
|    workload to gather execution frequency data. This data is
 | |
|    collected using hardware sampling, via perf. AutoFDO is most
 | |
|    effective on platforms supporting advanced PMU features like
 | |
|    LBR on Intel machines.
 | |
| 
 | |
| #. AutoFDO profile generation: Perf output file is converted to
 | |
|    the AutoFDO profile via offline tools.
 | |
| 
 | |
| The support requires a Clang compiler LLVM 17 or later.
 | |
| 
 | |
| Preparation
 | |
| ===========
 | |
| 
 | |
| Configure the kernel with::
 | |
| 
 | |
|    CONFIG_AUTOFDO_CLANG=y
 | |
| 
 | |
| Customization
 | |
| =============
 | |
| 
 | |
| The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for
 | |
| AutoFDO builds. One can, however, enable or disable AutoFDO build for
 | |
| individual files and directories by adding a line similar to the following
 | |
| to the respective kernel Makefile:
 | |
| 
 | |
| - For enabling a single file (e.g. foo.o) ::
 | |
| 
 | |
|    AUTOFDO_PROFILE_foo.o := y
 | |
| 
 | |
| - For enabling all files in one directory ::
 | |
| 
 | |
|    AUTOFDO_PROFILE := y
 | |
| 
 | |
| - For disabling one file ::
 | |
| 
 | |
|    AUTOFDO_PROFILE_foo.o := n
 | |
| 
 | |
| - For disabling all files in one directory ::
 | |
| 
 | |
|    AUTOFDO_PROFILE := n
 | |
| 
 | |
| Workflow
 | |
| ========
 | |
| 
 | |
| Here is an example workflow for AutoFDO kernel:
 | |
| 
 | |
| 1)  Build the kernel on the host machine with LLVM enabled,
 | |
|     for example, ::
 | |
| 
 | |
|       $ make menuconfig LLVM=1
 | |
| 
 | |
|     Turn on AutoFDO build config::
 | |
| 
 | |
|       CONFIG_AUTOFDO_CLANG=y
 | |
| 
 | |
|     With a configuration that with LLVM enabled, use the following command::
 | |
| 
 | |
|       $ scripts/config -e AUTOFDO_CLANG
 | |
| 
 | |
|     After getting the config, build with ::
 | |
| 
 | |
|       $ make LLVM=1
 | |
| 
 | |
| 2) Install the kernel on the test machine.
 | |
| 
 | |
| 3) Run the load tests. The '-c' option in perf specifies the sample
 | |
|    event period. We suggest using a suitable prime number, like 500009,
 | |
|    for this purpose.
 | |
| 
 | |
|    - For Intel platforms::
 | |
| 
 | |
|       $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
 | |
| 
 | |
|    - For AMD platforms:
 | |
| 
 | |
|      The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
 | |
| 
 | |
|      For Zen3::
 | |
| 
 | |
|       $ cat proc/cpuinfo | grep " brs"
 | |
| 
 | |
|      For Zen4::
 | |
| 
 | |
|       $ cat proc/cpuinfo | grep amd_lbr_v2
 | |
| 
 | |
|      The following command generated the perf data file::
 | |
| 
 | |
|       $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
 | |
| 
 | |
| 4) (Optional) Download the raw perf file to the host machine.
 | |
| 
 | |
| 5) To generate an AutoFDO profile, two offline tools are available:
 | |
|    create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
 | |
|    of the AutoFDO project and can be found on GitHub
 | |
|    (https://github.com/google/autofdo), version v0.30.1 or later.
 | |
|    The llvm_profgen tool is included in the LLVM compiler itself. It's
 | |
|    important to note that the version of llvm_profgen doesn't need to match
 | |
|    the version of Clang. It needs to be the LLVM 19 release of Clang
 | |
|    or later, or just from the LLVM trunk. ::
 | |
| 
 | |
|       $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
 | |
| 
 | |
|    or ::
 | |
| 
 | |
|       $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
 | |
| 
 | |
|    Note that multiple AutoFDO profile files can be merged into one via::
 | |
| 
 | |
|       $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
 | |
| 
 | |
| 6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
 | |
|    (Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
 | |
| 
 | |
|       $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
 |