All of the interesting technological, artistic or just plain fun subjects I'd investigate if I had an infinite number of lifetimes. In other words, a dumping ground...

Wednesday 26 September 2007

Makefile dependency generation

http://www.cs.washington.edu/orgs/acm/tutorials/dev-in-unix/makefiles.html

ACM Unix Tutorial Notes -- Makefiles
The basics


Make is a program for keeping a set of files up to date. Typically, you will have some C/C++ source files that you edit and, for each source file you edit, you'll have one or more object files that will need to be updated when the source file is changed. However, the files don't have to be C/C++ source files: they could be TeX and PostScript files for a book you are writing, or anything you care to think of.


You tell Make what to do by writing makefiles. Makefiles contain comments, variables, rules, and, sometimes, various conditional and control statements. Comments are pretty simple: they start with a # character, and run to the end of line. We'll meet some conditionals and control statements later in this document, but not for a while, so that leaves us with rules
and variables to talk about.


Let's start with rules. Here is a typical rule:

spam.o : spam.cc ham.h
g++ -c spam.cc

As you can see, rules have three parts to them:

TARGET(s) : DEPENDENCIES
<TAB> COMMAND_1
<TAB> COMMAND_2
...
<TAB> COMMAND_N

The TARGET(s) are the files you want to keep up to date. When you change one of the files on the DEPENDENCIES list, you want Make to "think" that TARGET(s) are out-of-date. The COMMAND(s) are the action(s) you want Make to take to remedy the situation. Specifically, given the DEPENDENCIES, it must be possible to produce the TARGET(s) by running the shell commands COMMAND_1 through COMMAND_N. Note that the line(s) on which COMMAND(s) occur must start with the TAB character, otherwise Make will become
confused.


Getting Make to make breakfast


To get a bit more comfortable with the Make rules concepts, let's write a
simple makefile to make breakfast. (It won't quite work, of course, unless
you can figure out some way to get the computer to actually take the
actions we put in the rules, but I think it'll clarify the concepts).


So, let's say we need (boiled) eggs, toast, and coffee for our breakfast.
Then our breakfast rule will be simply:

breakfast : boiled_eggs toast coffee butter
<TAB> peel boiled_eggs
<TAB> put toast on a plate
<TAB> pour coffee into cup

So far so good, but we don't have boiled_eggs, we only have raw eggs. So we
need another rule:

boiled_eggs : raw_eggs water pot stove
<TAB> pour water into pot
<TAB> put eggs in the water
<TAB> put pot on the stove
<TAB> boil until eggs ready

Now we need a rule to make a toast, and another to make coffee. I won't go
through the excercise of writing these rules out: I think you get the idea.


Note how the targets of the rules form a tree: the breakfast target depends
on boiled_eggs, toast, and coffee, and each of these is a target of its own
rule (then, coffee might depend on hot_water, etc.)


A first makefile


With what we have seen, we can build a very simple makefile. Let's say, we
want to build an executable named snakeoil, and the source files are
spam.cc, and eggs.cc. Here's the first makefile, in the style of our
breakfast makefile:

snakeoil : spam.o eggs.o
<TAB> g++ -o snakeoil spam.o eggs.o

spam.o : spam.cc spam.h common.h
<TAB> g++ -g -Wall -c spam.cc

eggs.o : eggs.cc eggs.h common.h
<TAB> g++ -g -Wall -c eggs.cc

One more thing to remember about makefiles is that, when given no
arguments, Make will attempt to build the TARGET(s) of the rule which
occurs the earliest in the makefile (in the example, it's snakeoil). It is
quite common to write a rule whose target is all, which has as its
dependencies all the things you want your makefile to build, and whose list
of commands is empty. Then, you make sure that the all rule is the first
rule of your makefile.


Rest Stop 1


You now know enough to write simple but functional makefiles. The rest of
this document introduces features of Make which can help you write
makefiles that are shorter, easier to modify, and do more of the routine
work for you. In other words, the following sections are about being lazy.
I try to introduce the more commonly-used features of Make first, followed
by some of the more exotic features. I'd recommend reading at least the
section on Make variables, which follows.


Being Lazy 1: Variables.


In the simple makefile we had produced, we repeated the name of the
compiler we use to compile source code to object files and the flags we
pass to the compiler twice (in the spam.o and the eggs.o rules). If we
later decided we wanted to use a different compiler, or change the flags we
pass to the compiler, we would have to make changes every place the
compiler is invoked. Needless to say, making such changes is error-prone
and tedious: there has to be a better way.


Remember we said makefiles could have variables? This would be a good time
to start using them. You set makefile variables "just as you'd expect":

MY_VAR = some text here
# no semicolon at the end of line ;-)

When setting Make variables, you should remember that, while whitespace
around the = sign is not significant, any trailing whitespace (whitespace
following the value of the variable, up to the comment character or end of
line) becomes part of the variable's value. So, if you write:

MY_COMPILER = g++ # my favorite compiler

then the value of the MY_COMPILER variable will be "g++ ".


You get at the value of MY_VAR by enclosing the variable name in
parentheses, and sticking a dollar sign in front of the whole thing:
$(MY_VAR). Make conceptually just does a textual substitution of $(MY_VAR)
with "some text here".


With that background, we can rewrite the makefile like so:

# C++ compiler is traditionally called CXX
CXX = g++

# Another traditional name for the C++ compiler flags
CXXFLAGS = -g -Wall

# Command to make an object file:
COMPILE = $(CXX) $(CXXFLAGS) -c

# Command used to link objects:
# LD is sort of traditional as well,
# probably stands for Linker-loaDer
LD = $(CXX)

# The flags to pass to the linker
# At UW CSE, if you are using /uns/g++ as your compiler,
# You'll probably want to add -Wl,-rpath,/uns/lib here.
# This way you won't have to set the LD_LIBRARY_PATH environment
# variable just to run the programs you compile.
LDFLAGS = # -Wl,-rpath,/uns/lib

# Any libraries we may want to use
# you would need a -lm here if you were using math functions.
LIBS =

# The name of the executable we are trying to build
PROGRAM = snakeoil

# The object files we need to build the program
OBJS = spam.o eggs.o

$(PROGRAM) : $(OBJS)
<TAB> $(LD) $(LDFLAGS) $(OBJS) $(LIBS) -o $(PROGRAM)

spam.o : spam.cc spam.h common.h
<TAB> $(COMPILE) spam.cc -o spam.o

eggs.o : eggs.cc eggs.h common.h
<TAB> $(COMPILE) eggs.cc -o spam.o

This is quite an improvement, but we can get even more functionality out of
Make variables. Make has a few variables whose name is a single punctuation
character. These variables are called "automatic variables", and they are
special in that, to quote the GNU Make manual, "these variables have values
computed afresh for each rule that is executed, based on the target and
prerequisites of the rule." Thus, $(@) stands for the target of the current
rule, $(^) stands for all the dependencies of the current rule, and $(<)
stands for the first dependency of the current rule (the first thing in
$(^)). When writing these variables, it is conventional to ommit the
parentheses around the variable name, and to write simply $@, $^, and $<
respectively. Using these special variables, we can rewrite the rules
portion of the Makefile like this:

$(PROGRAM) : $(OBJS)
<TAB> $(LD) $(LDFLAGS) $^ $(LIBS) -o $@

spam.o : spam.cc spam.h common.h
<TAB> $(COMPILE) $< -o $@

eggs.o : eggs.cc eggs.h common.h
<TAB> $(COMPILE) $< -o $@

Rest Stop 2


You now know how to use Make variables to reduce repetition in the
Makefiles you write. However, if you look at the commands used in the
eggs.o and spam.o rules, you will notice that they are exactly the same.
Can we somehow exploit this similarity? That's the question we take up in
the next section.


Intermission: meet GNU Make


A gnu is an African wildebeest; GNU -- a recursive acronym for "GNU is Not
Unix" -- is a project started by Richard M. Stallman in 1984 to develop a
completely free Unix-like operating system. Many tools found in a typical
Linux distribution, including gcc, gdb, and make come from the GNU project.
The tools produced by the GNU project tend to support most commonly-used
features found in their Unix counterparts, but also to introduce a number
of extensions. From this point on, I will be relying quite heavily on the
features specific to GNU Make. These features may not be found in every
version of Make. Since all Linux distributions come with GNU Make by
default, and GNU Make is readily available for a wide variety of platforms,
I don't think it's a big problem for me to introduce GNU Make extensions
here.


Being Lazy 2: Pattern Rules


We noticed that the command to compile eggs.cc and spam.cc look exactly the
same. If we had a third source file in our project, say, ham.cc, its
compilation command would probably look exactly the same as the command to
compile eggs.cc and spam.cc. What we really want is to be able to tell
Make: "any time you need to produce a .o file given a .cc file, here is the
command to use." Here is how to write such a rule:

%.o : %.cc
<TAB> $(COMPILE) $< -o $@

As you can see, the % character, when it appears in the target portion of a
rule, acts as a wildcard, matching any non-empty portion of a filename. In
the dependencies portion of the rule, % stands for the same portion of the
filename the % had matched in the target portion of the rule. So, if Make
was trying to produce eggs.o, it would match eggs.o against the pattern %.o
(here % would match eggs) and then replace % with eggs in the dependencies
portion of the rule, (conceptually) producing a rule that looks like this:

eggs.o : eggs.cc
<TAB> $(COMPILE) $< -o $@

Using pattern rules, we can rewrite the portion of our makefile that deals
with compiling the object files like this:

spam.o : spam.cc spam.h common.h

eggs.o : eggs.cc eggs.h common.h

%.o : %.cc
<TAB> $(COMPILE) $< -o $@

Rest Stop 3


You have now seen how pattern rules can help us avoid writing the same
sequence of commands over and over again. You will notice, however, that,
for each object file, we still have to specify the source and header files
it depends on. Do we really have to write those dependencies by hand? The
next section shows how to generate the dependencies automatically.


Being Lazy 3: Automatic Dependencies Generation


How might we avoid having to write out the dependencies by hand? There
really are two issues here: how do we know when a particular object file
needs to be rebuilt, and how do we integrate that knowledge into the build
process. Knowing when to rebuild a particular object file is surprisingly
easy: we just ask the compiler. Given a source file, for example spam.cc,
running:

g++ -M spam.cc

will produce a list of dependencies for the spam.o object file. In our
hypothetical example such list might look like this:

spam.o : spam.cc spam.h common.h ...(long list of system headers omitted).

if you have a Linux shell prompt handy, you might want to try running g++
-M on a couple of C++ source files to see what output is produced.


As an extension, the g++ compiler also provides the -MM command-line option
which lists only the files that are not system include files. Again, you
may want to try running g++ -MM on a couple of C++ source files, and
compare the output with that produced by running g++ -M on those same
files.


So, now that we have the dependency information, how do we integrate it
into the build process? The method recommended in the GNU Make manual is
that we should produce a mini-makefile for each source file in our project,
and then use the Make's include directive to pull those mini-makefiles into
the master makefile. The directive to pull those mini-makefiles into the
master makefile is commonly written like this:

-include $(OBJS:.o=.d)

The $(OBJS:.o=.d) syntax means "take the value of the OBJS variable, and
substitute the string .o with the string .d everywhere .o occurs in that
value." The GNU Make manual also gives a pattern rule which can be used to
produce the mini-makefiles. Here is the rule, adjusted to use C++ compiler,
and the -MM option:

%.d: %.cc
<TAB> set -e; $(CXX) -MM $(CPPFLAGS) $< \
<TAB> | sed 's/\($*\)\.o[ :]*/\1.o $@ : /g' > $@; \
<TAB> [ -s $@ ] || rm -f $@

I realize this looks rather daunting, so let me try to explain what is
going on here.


First, you will notice the rule is a pattern rule: it will produce a file
named eggs.d given the file eggs.cc.


Now for the commands:


set -e; $(CXX) -MM $(CPPFLAGS) $< \
The $(CXX) -MM $< should look quite familiar by now: it simply asks
g++ to produce a list of the files changes to which will require that
the .o object file to be rebuilt. The CPPFLAGS variable traditionally
holds any flags for the preprocessor, most notably the -I flags,
which specify additional directories to find the header files in. The
set -e; has no relation to the compiler invokation per se -- it
simply tells the shell which runs the compiler to exit immediatelly
should it encounter an error. It is considered a good practice start
shell commands with the set -e; incantation. (But, frankly, I don't
see how it is useful in this particular case).
| sed 's/\($*\)\.o[ :]*/\1.o $@ : /g' > $@; \
One way to see what's going on here, is to take a filename such as
eggs.cc, and substitute into this command: $* becomes eggs and $@
becomes eggs.d. (We have not seen the $* automatic variable yet: in a
pattern rule, it stands for the portion of the target filename that
the % had matched.) So the command becomes:

| sed 's/\(eggs\)\.o[ :]*/\1.o eggs.d : /g' > eggs.d; \

Recall that $(CXX) -MM $(CPPFLAGS) produced output that looked
something like this:

eggs.o : eggs.cc eggs.h common.h

Now this line transforms it to look like this:

eggs.o eggs.d : eggs.cc eggs.h common.h

,then puts the result into the files eggs.d, which, you'll recall is
the mini-makefile we want to produce. So, the goal of this line is to
tell Make that not only does the eggs.o object file need to be
rebuilt should any of the files eggs.cc eggs.h, or common.h change,
but that the eggs.d mini-makefile also needs to be rebuilt should any
of those three files change.


Why would we want such a thing? Because, before GNU Make attempts to
build any object files or executables, it ensures that any makefiles
it knows how to rebuild are up-to-date. In particular, it will
rebuild eggs.d should eggs.cc, eggs.h or common.h change.


Now, pretend you modified eggs.cc so it now #include's the spam.h
header file. When you type make on the command line, eggs.cc will
have a more recent modification time than eggs.d, so Make will
rebuild eggs.d, picking up the fact that eggs.o (and eggs.d) now
depend on spam.h in addition to the three files eggs.cc, eggs.h, and
common.h. Make will then re-consider whether eggs.d should be updated
(most likely, it won't need to be), re-read eggs.d, and notice that
eggs.o needs to be rebuilt. Any subsequent changes to spam.h will
cause eggs.o to be rebuilt. In other words, the automatic dependency
generation "just works."


[ -s $@ ] || rm -f $@
This line just removes the .d file, if its empty.


With automatic dependency generation, our Makefile can be rewritten like
this:

# C++ compiler is traditionally called CXX
CXX = g++

# The traditional name for the flags to pass to the preprocessor.
# Useful if you need to specify additional directories to look for
# header files.
CPPFLAGS = # -I../my_headers

# Another traditional name for the C++ compiler flags
CXXFLAGS = -g -Wall

# Command to make an object file:
COMPILE = $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c

# Command used to link objects:
# LD is sort of traditional as well,
# probably stands for Linker-loaDer
LD = $(CXX)

# The flags to pass to the linker
# At UW CSE, if you are using /uns/g++ as your compiler,
# You'll probably want to add -Wl,-rpath,/uns/lib here.
# This way you won't have to set the LD_LIBRARY_PATH environment
# variable just to run the programs you compile.
LDFLAGS = # -Wl,-rpath,/uns/lib

# Any libraries we may want to use
# you would need a -lm here if you were using math functions.
LIBS =

# The name of the executable we are trying to build
PROGRAM = snakeoil

# The object files we need to build the program
OBJS = spam.o eggs.o

$(PROGRAM) : $(OBJS)
<TAB> $(LD) $(LDFLAGS) $^ $(LIBS) -o $@

-include $(OBJS:.o:=.d)

%.d : %.cc
<TAB> set -e; $(CXX) -MM $(CPPFLAGS) $< \
<TAB> | sed 's/\($*\)\.o[ :]*/\1.o $@ : /g' > $@; \
<TAB> [ -s $@ ] || rm -f $@

%.o : %.cc
<TAB> $(COMPILE) $< -o $@

Rest Stop 4


Whew! You've seen how to take advantage of the g++ compiler to generate
dependencies for the files in your project automatically. You now have a
boilerplate Makefile you can adapt to projects of your own simply by
changing the PROGRAM and OBJS lines. But this makefile only allows you to
produce a single executable. On the other hand, you may need to produce
more than one executable in the same source code directory. The next
section shows one possible way to handle such need.


Producing Multiple Executables (while staying lazy)


Let us try to modify the single-program makefile we came up with to handle
building multiple programs. Let us pretend we are building two programs
named cat and dog. We will want to replace our PROGRAM variable with
something like ALL_PROGRAMS, we will also want to list which object files
are needed to build cat and which are needed to build dog. Finally, we will
want an easy rule to to build everything:

ALL_PROGRAMS = cat dog

cat_OBJS = cat.o mammal.o

dog_OBJS = dog.o mammal.o

# Remember that the first rule in the makefile is the default
all : $(ALL_PROGRAMS)

# We don't want Make to get confused if a file named
# "all" should happen to exist, so we say that
# "all" is a PHONY target, i.e., a target that Make
# should always try to update. We do that by making all a
# dependency of the special target .PHONY:
.PHONY : all

Now, we need a rule to actually build cat from its object files, and
another rule to to build dog from its object files. But we should not put a
rule like:

dog : $(dog_OBJS)
<TAB> $(LD) $(LDFLAGS) $^ $(LIBS) -o $@

into the makefile. Why not? Because then we will need to add another rule
like this to build cat, and, when we add another executable named cow, we
will need to add yet another rule to build cow, etc. In other words, the
rules to build cat, dog, cow, etc., from their respective object files
should be generated, not hard-coded.


One approach to getting make to generate such link rules might be as
follows: for each program we want to build, we produce a .link
mini-makefile which contains the rule to build that program. Then we use
the Make's include directive to pull all such mini-makefiles into the
master makefile. (If this sounds familiar, it should -- this is exactly the
approach we used to automate generating dependencies).

# The only file a .link file really properly depends on is the
# makefile itself. However, I could not find a way to tell
# which makefile GNU make is currently processing. So, instead,
# I always force the .link file to be remade. This is a
# drawback of my apporach.
%.link : always_remake
<TAB> echo '$* : $($*_OBJS)' >$@
<TAB> echo '#$$(LD) $$(LDFLAGS) $$^ $$(LIBS) -o $$@' | tr '#' '\t' >>$@
.PHONY : always_remake

To understand this rule, you'll need to recall that $* stands for the part
of the filename that the % had matched in the rule's target. The $($*_OBJS)
syntax requires explanation as well; it is probably easiest to explain with
an example. Let's say we are building the file cat.link. In that case % in
the target will match cat, and $* will have the value cat as well. Now, to
find the value of $($*_OBJS), Make will substitute cat for $*, producing
$(cat_OBJS). It will then substitute the value of the cat_OBJS variable,
which is cat.o mammal.o. The GNU Make manual refers to that feature as
"Computed Variable Names"; this feature can come in quite handy at times:
we'll see another example of its use shortly. Finally, the doubled dollar
signs is how you quote the dollar sign in makefiles: this way, what ends up
in the mini-makefile we are generating is the string $@, rather than a
string like cat.link (which is a possible expansion of $@ in this rule).


Now that we got past generating the rules for linking (or hacked our way
around the problem, depending on your point of view), the only remaining
issue we have to handle is the fact that in addition to rebuilding
everything (by typing make or make all), we would also like to be able to
say make cat, or make dog. If we type make cat, then the dog.link
mini-makefile should not be re-generated, and neither should the dog.d
mini-makefile.


So, inside the makefile, we would like to know what it is the user has
asked us to build. Luckily, GNU Make provides a built-in variable named
MAKECMDGOALS which contains just the information we want. The plan is to
examine the MAKECMDGOALS variable, and to set up a list of programs which
should be built:

# The special variable MAKECMDGOALS contains the list of targets the
# user has supplied on the "make" command line. The
# MAKECMDGOALS variable will only be "defined" (will have a
# value) if the user had actually provided a target on the command
# line.
ifdef MAKECMDGOALS
# the user specified a target on the command line:
# was it "all", or was it something else?
ifeq "$(MAKECMDGOALS)" "all"
programs_to_build = $(ALL_PROGRAMS)
else
programs_to_build = $(MAKECMDGOALS)
endif # ? "$(MAKECMDGOALS)" == "all"
else # ! MAKECMDGOALS
# the user didn't specify a target on the command line
# (i.e., they just typed "make"):
# act as if they had specified the "all" target
programs_to_build = $(ALL_PROGRAMS)
endif # ? MAKECMDGOALS

As you'd expect, the ifdef conditional tests if a variable has a value, and
the ifeq conditional tests if its arguments are equal.


Now that we have a list of programs the user wants to build, we produce
from it three more lists: the list of .link mini-makefiles we will need,
the list of object files we will need, and the list of .d mini-makefiles we
will need:

dot_link_files = $(programs_to_build:%=%.link)
objects_to_compile = $(sort $(foreach
program,$(programs_to_build),$($(program)_OBJS)))
dot_d_files = $(objects_to_compile:.o=.d)

Here, the foreach function takes the (whitespace-separated) list of strings
in $(programs_to_build), and produces and returns another
(whitespace-separated) list. To produce the return value, it sets the
variable program, to each element of the $(programs_to_build) list in turn,
and expands the expression $($(program)_OBJS). The return value is then the
combination of all these expansions separated by whitespace (so, the
foreach function is very similar in spirit to the map/mapcar family of
functions in Lisp). Here, the sort function is used simply to remove
duplicates from the list the foreach function returns.


We are done, all that is left now is to pull in the needed .links and .d
mini-makefiles into the master makefile, using the include directive. But
we only want to do so if the the $(objects_to_compile) variable actually
contains something besides whitespace (it would be rather pointless to
re-generate the .d and .link mini-makefiles only to remove them, if we are
running the clean rule, for example):

ifneq "$(strip $(objects_to_compile))" ""
-include $(dot_link_files)
-include $(dot_d_files)
endif

Here is the makefile that resulted. I hope you find it useful for your
projects.


For More Information


The best source of information about GNU Make is the GNU Make reference
manual. The manual is available on most Linux machines in the Info format,
and can be read by typing info make at the shell prompt. The manual is also
available on the web at http://www.gnu.org/manual/make/


Thank You For Listening


This is all I have to say on the subject of makefiles. I would like to hear
any comments you may have after reading this tutorial. I am especially
interested in hearing (constructive) criticism and suggestions for
improvement, but all comments are welcome.

Evgeny Roubinchtein
Last modified: Sat Dec 29 12:23:18 PST 2001

No comments:

tim's shared items

Blog Archive

Add to Google Reader or Homepage