Binutils Porting Guide To A New Target Architecture

Although the binutils project includes a 100-page guide to its internals, this article is aimed primarily at those wishing to develop/port binutils itself for the first time. The guide also suffers from the following limitations:

  1. It tends to document at a detailed level. So individual functions are described well, but it is hard to get the big picture.
  2. It is incomplete. Many of the most useful sections (for example, details of final relocation) are yet to be written.

Developers who require to port binutils to a new architecture are faced with discovering how binutils works by reading the source code and looking at how other architectures have been ported.

I, personally, went through that process when porting binutils to the Compact-RISC (a.k.a CR16) architecture. This article aims to capture the learning experience, with the intention of helping others, especially those looking to port binutils tools to a new target.

Further sources of information
Written documentation

The main user guides for binutils tools provide a great deal of context about how they are intended to work. The document on the binutils internals is essential reading, before and during any porting exercise. It is not complete, nor is it always up to date, but it is the first place to look for an explanation on what a particular function does.

The binutils tools rely upon a separate specification of the Binary File Descriptor for each architecture. This has its own comprehensive user guide.
The main binutils tools code base is generally well commented, particularly in the headers for the major interfaces. Inevitably, this emerges as the definitive place to find out exactly how a particular function is expected to behave.

Other information channels

The main website for binutils tools is at http://sourceware.org/binutils. The binutils developer community communicates through the binutils mailing lists. These are always good places to find solutions to a problem.

The main mailing list for discussions is binutils at sourceware dot org, although for a detailed understanding, reading the bug reporting mailing list bug-binutils at sourceware dot org is also recommended. See the main binutils website for details about subscribing to these mailing lists.

Binutils file organisation structure

The bulk of the binutils source code is in a small number of directories. Some components of binutils are libraries that are used internally as well as in other projects. For example, the BFD library is used in the GNU GDB debugger. These libraries have their own top-level directories. The main directories are shown in Figure 1.

Figure 1: Directory structure

Here, in brief, is some information on these directories:

  • include/ contains the header files for information that straddles major components. For example, the main simulator interface header is here (remote-sim.h), because it links GDB (in the gdb/ directory) to the simulators (in the sim/ directory). Other headers, specific to a particular component, reside in the directory of that component.
  • bfd/ contains the Binary File Descriptor library. This library contains code to handle specific binary file formats, such as ELF, COFF, SREC and so on. If a new object file type must be recognised, code to support it should be added here.
  • opcodes/ contains the opcodes library. This has information on how to assemble and disassemble instructions.
  • cpu/ contains source files for a utility called CGEN. This is a tool that can be used to automatically generate target-specific source files for the opcodes library, as well as for the SIM simulator used by GDB.
  • binutils/: Despite the name, this is not the main binutils directory. Rather, it is the directory for all of the binutils tools that do not have their own top-level source directory. This includes tools such as objcopy, objdump and readelf, amongst others.
  • gas/ contains the GNU assembler. Target-specific assembler code is held in the config/ sub-directory.
  • ld/ contains the GNU linker. Target-specific linker files are held in sub-directories.
  • gprof/ contains the GNU profiler. This program does not have any target-specific code.
  • gold/ contains the new GNU linker. This is a new linker being created to replace LD. At the moment it is still in development.
  • elfccp/ contains elfcpp, a C++ library for reading and writing ELF information. It is currently only used by the GOLD linker.

In addition, there are a couple of other directories that you can find at the top level of a binutils source release. They are used in the binutils build process, but are not part of the binutils project:

  • intl/ contains the GNU gettext library from gettext.
  • libiberty/: Before POSIX and glibc, this was a GNU project to provide a set of standard functions. It lives on in binutils. Most valuable are its free store management and argument parsing functions.

Main functional areas and data structures

Binary file description

Binary file description (BFD) is a package that allows applications to use the same routines to operate on object files, whatever the object file format. A new object file format can be supported simply by creating a new BFD backend and adding it to the library.

The BFD library backend creates a number of data structures describing the data held in a particular type of object file. Ultimately, a unique enumerated constant (of type enum bfd_architecture) is defined for each individual architecture. This constant is then used to access the various data structures associated with that particular architecture.

In the case of the Compact-RISC, 16-bit implementation (which may be a COFF or ELF binary), the enumerated constant is bfd_cr16_arch. This can be used to access various structures, for example:

  const bfd_arch_info_type bfd_cr16_arch =
    {
      16,               /* 16 bits in a word.  */
      32,               /* 32 bits in an address.  */
       8,               /*  8 bits in a byte.  */
      bfd_arch_cr16,    /* enum bfd_architecture arch.  */
      bfd_mach_cr16,    /* Machine value, used to distinguish between cr16 variants.  */
      "cr16",           /* Architecture name (short version).  */
      "cr16",           /* Architecture name (long version).  */
      1,                /* Section alignment power.  */
      TRUE,             /* True if this is the default machine for the architecture.  */
      bfd_default_compatible, /* Function to call to determine if two different architectures are compatible.  */
      bfd_default_scan, /* Function to call to determine if a given string matches this architecture.  */
      NULL,             /* Pointer to the next CR16 machine architecture.  */
    };

This particular structure is defined in a file called cpu-<target>.c in the bfd/ directory.

The file <file_format>-<target>.c (for example, elf32-cr16.c) is used to provide target-specific support for the given file format and architecture. At the very least, it provides the following information:

  1. A reloc_map array that maps BFD relocation enumerations into a target-specific relocation type.
  2. A reloc_howto_typearray with target-specific relocation details. Here is an example array entry from the cr16 port:
      (R_CR16_NONE,              /* Type.  */
       0,                        /* Rightshift.  */
       2,                        /* Size.  */
       32,                       /* Bitsize.  */
       FALSE,                    /* PC_relative */
       0,                        /* Bitpos */
       complain_overflow_dont,   /* Complain_on_overflow */
       bfd_elf_generic_reloc,    /* Special_function */
       "R_CR16_NONE",            /* Name */
       FALSE,                    /* Partial_inplace */
       0,                        /* Src_mask */
       0,                        /* Dst_mask */
       FALSE),                   /* PCREL_offset */
  3. Define the macros below with settings specific to the target:
    #define TARGET_LITTLE_SYM
    #define TARGET_LITTLE_NAME
    
    #define ELF_ARCH
    #define ELF_MACHINE_CODE
    #define ELF_MAXPAGESIZE
    #define elf_symbol_leading_char
    
    #define elf_info_to_howto
    #define elf_info_to_howto_rel
    
    #define elf_backend_relocate_section
    #define elf_backend_gc_mark_hook
    #define elf_backend_gc_sweep_hook
    #define elf_backend_can_gc_sections
    #define elf_backend_rela_normal
    #define elf_backend_check_relocs
    #define elf_backend_final_write_processing
    #define elf_backend_object_p
    #define elf_backend_create_dynamic_sections
    #define elf_backend_adjust_dynamic_symbol
    #define elf_backend_size_dynamic_sections
    #define elf_backend_omit_section_dynsym
    #define elf_backend_finish_dynamic_sections
    #define elf_backend_reloc_type_class
    #define elf_backend_want_got_plt
    #define elf_backend_plt_readonly
    #define elf_backend_want_plt_sym
    #define elf_backend_got_header_size
    
    #define bfd_elf32_bfd_reloc_type_lookup
    #define bfd_elf32_bfd_reloc_name_lookup
    #define bfd_elf32_bfd_relax_section
    #define bfd_elf32_bfd_get_relocated_section_contents
    #define bfd_elf32_bfd_merge_private_bfd_data
    #define bfd_elf32_bfd_link_hash_table_create
    #define bfd_elf32_bfd_link_hash_table_free

In addition, the files archures.c, config.bfd, Makefile.am and target.c in the bfd/ directory should be updated with the necessary target-specific changes.

opcodes

The opcodes/ directory should have at least two target specific files—one to assemble the target instructions and the other to disassemble them. The file names are:

  1. <target>-opc.c and <target>-opc.h (header files are optional)
  2. <target>-dis.c and <target>-dis.h (header files are optional)

The <target>-dis.c files include code to print disassembled instructions, and also code for matching opcodes and operands appropriately.
The configure.in, disassemble.c and Makefile.am files in the opcodes directory also need to have target-specific changes made to them.

include

The include/ directory contains target-specific header files, usually in a file format-specific subdirectory. opcode information is normally held in a target-specific file in the opcode subdirectory. For example:

    include/elf/cr16.h
    include/opcode/cr16.h

binutils

This directory doesn’t require any new target specific files. But the configure.tgt, Makefile.am and readelf.c files should be updated with any target-specific information needed.

gas

The gas/config/ subdirectory contains the target-specific files for the assembler. The file names are tc-<target>.c and tc-<target>.h.

The above files should include:

  • Sizes of (i.e., macros defines):
    • Registers
    • Instructions (i.e., maximum size)
    • Operands
  • Operand error types
  • Comment character used in assembly code
  • Comment character used in assembly code line
  • Line separator
  • Defining the target-specific
    • multi-character options, if any.
    • Process machine-dependent command line options in the md_parse_option function
    • Include machine-dependent usage-output in the md_show_usage function
  • A redefinition of the assemble directive using md_pseudo_table
  • Functions for getting registers along with type, size, and other information
  • md_begin function used to initialise/set-up all the tables, etc, that are machine-dependent items of the assembler
  • Parse functions:
    • parse_insn
    • parse_operands
    • parse_operand
  • Print functions:
    • print_insn
    • print_operand
    • print_operands
  • Print functions:
    • Function to assemble a single instruction assemble_insn
    • md_assemble is the first function called to assemble instruction

In the gas/ directory itself the configure.tgt and Makefile.am files need to be modified to refer to the new files, and to add support for the new target.

ld

In scripttempl/<format><target>.sc define the default linker script file for the architecture.

In emulparams/<target><format>.em define the default emulation script file. This contains any functions necessary to customise the behaviour of the linker.

In emulparams/<format><target>.sh define any parameters that can be used to modify the default linker script file.

In the ld/ directory itself, the configure.tgt file needs to be updated to add your new target information and the Makefile.am with new target information build rules.

Build and test

Build binutils tools

Building binutils tools requires you to follow the steps shown below:

  • Configure:Run the configure script with the target and prefix options. For example:
    src/configure --target=cr16-elf --prefix=/local/cr16-bintuils

    Here, the --target option defines the target you want to build the binutils tools for, and the --prefix option is to specify the location of the binutils tools installation directory.

  • Build:Run make to build the binutils tools for the above configured target. For example:
    make all
    make install

    …or:

    make all install

Note: You can use the make -jN option to build the tools quickly by using all processors/CPUs (that is, N) on your host PC. For example, if your host PC has four processors, then use make -j4 for a quick build.

Test binutils tools

You can test the above built tools using the binutils test suites like gas, binutls and ld. Running the binutils test suite requires that the DejaGNU package is installed and the DEJAGNU environment variable is set. The tests can then be run with:

make check

The above command runs the binutils, gas and linker test suites.

With the following commands you can run the binutils, gas and linker test suites separately:

make check-binutils
make check-gas
make check-ld

On completion of the binutils test run, a summary of the results will be in the binutils/ directory in the binutils.sum file. More detailed information will also be available in the binutils/binutils.log file. For the gas test suite, the results are in the gas/testsuite/gas.sum and gas/testsuite/gas.log files, and for the linker test suite they are in the ld/ld.sum and ld/ld.log files.

For the most comprehensive tests in an environment where the host and target differ, DejaGNU needs some additional configuration. You can achieve this by setting the DEJAGNU environment variable to refer to a suitable configuration file, and defining a custom board configuration file in the directory ~/boards. These configuration files can be used to specify a suitable simulator and how to connect it when running tests.

Documentation

Some binutils subdirectories, in turn, have the doc/ subdirectories. The documentation is written in texinfo, from which documents can be generated as PDF, PostScript, HTML or info files. The documentation is not built automatically with make all, nor with make doc. To create the documentation, change to the individual documentation directory and use make HTML, make PDF, make ps or make info, as required. The main documents of interest are:

  • bfd/doc/bfd.texinfo is the BFD manual
  • binutils/doc/binutils.texi is the main binutils user guide
  • ld/ld.texinfo is the linker user guide
  • gas/doc/as.texinfo is the assembler user guide

The exception to automatic building is with make install. This will build info files for any documents in the binutils/doc/ directory and install them in the info/ subdirectory of the install/ directory.

Note: This document is based on the author’s experience till date. It will be updated in future. Suggestions for improvements are always welcome.

  • vamsik

    Good one

  • Sreenadh

    Is futex necessary to load the root file system on linux?
    When futex enabled on my kerenl it is not working. It is not coming to the start_kernel function. Please give any suggestion.

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.