[CMake] FYI - From Ninja-build mailing list - Fwd: Proposal: restat rules

Jean-Christophe Fillion-Robin jchris.fillionr at kitware.com
Wed Sep 7 23:04:42 EDT 2011


---------- Forwarded message ----------
From: Peter Collingbourne <peter at pcc.me.uk>
Date: Wed, Sep 7, 2011 at 9:17 PM
Subject: Proposal: restat rules
To: ninja-build at googlegroups.com


Hi,

In this email I'll try to explain one of the oddities of make (which
some CMake-based build systems rely on), and why we can't currently
express this in Ninja.  I'll also propose how we could extend Ninja
to support this behaviour.

(Warning, long essay ahead.)

In the LLVM project we maintain a code generator called tblgen,
the purpose of which is to generate a number of header files
containing various metadata.  From time to time, tblgen and its
dependent libraries will be modified, causing a rebuild of tblgen.
Strictly speaking, we should now rebuild the header files, and all of
their reverse dependencies, even if the generated file did not change.
This is suboptimal, because the reverse dependencies constitute
every object file built from a source file that transitively includes
one of the tblgen generated files (which is >50% of object files in
the build).

To avoid this problem, we cause tblgen to write its output to a
temporary file, and use a utility to copy the temporary file over the
target file only if the temporary file is different from the target.
In makefile terms, it looks something like this:

-----
all: outputuser.o

tblgen: tblgen.cpp
       touch tblgen

output.inc.tmp: tblgen
       touch output.inc.tmp

output.inc: output.inc.tmp
       if cmp output.inc.tmp output.inc ; then : ; else cp output.inc.tmp
output.inc ; fi

outputuser.o: outputuser.cpp output.inc
       touch outputuser.o
-----

Note what happens during an incremental build where tblgen is the
only dirty file, but its output file output.inc.tmp does not change
relative to output.inc.  make initially schedules a rebuild of tblgen,
output.inc.tmp, output.inc and outputuser.o.  After output.inc has
been rebuilt, its timestamp remains the same as before the build.
Before make begins to rebuild outputuser.o, it will re-evaluate the
dirty state of outputuser.o based on the timestamps of its inputs
(i.e. it will stat them again).  Because outputuser.cpp and output.inc
are both older than outputuser.o, make doesn't rebuild it after all,
despite it being initially scheduled for a build.

The behaviour is different in Ninja, which operates in two phases:
the scheduling of the build and the build itself.  Like make, Ninja
will schedule a rebuild of tblgen, output.inc.tmp, output.inc and
outputuser.o.  Unlike make, it will rebuild targetuser.o, because
it does not re-stat inputs during the build.  The key observation
here is that unlike make, Ninja currently provides no mechanism for
pruning the scheduled build graph during a build using a build rule.

What I propose for Ninja is that we implement this pruning behaviour
in a similar way to make, but only for specific rules with a special
variable set on the rule.  We can call this variable "restat"
(suggestions for better names are welcome).  If this variable is
present on a rule, Ninja will, after executing the rule command,
re-stat each output file to obtain its modification time.  If the
modification time is unchanged from when Ninja initially stat'ed the
file before starting the build, Ninja will mark that output file as
clean, and recursively for each reverse dependency of the output file,
recompute its dirty status.

As an improvement over what make does, Ninja then stores the current
timestamp in the build log entry associated with the output file.
This timestamp will be treated by future invocations of Ninja as the
output file's modification time instead of the output file's actual
modification time for the purpose of deciding whether it is dirty
(but not whether its reverse dependencies are dirty).

To give an example of how this would look, here is the above makefile
translated to Ninja:

-----
rule touch
 command = touch $out

rule cpifdiff
 command = if cmp $in $out ; then : ; else cp $in $out ; fi
 restat = true

build tblgen: touch tblgen.cpp
build output.inc.tmp: touch tblgen
build output.inc: cpifdiff output.inc.tmp
build outputuser.o: touch outputuser.cpp output.inc

default outputuser.o
-----

Now consider what happens when Ninja is asked at timestamp 3 to rebuild
the default target (outputuser.o) where tblgen.cpp has timestamp 2 and
all other files have timestamp 1.  Ninja will schedule a rebuild of
tblgen, output.inc.tmp, output.inc and outputuser.o.  Again, suppose
that the contents of output.inc.tmp are equal to output.inc when built.
So after output.inc has been rebuilt, it still has a timestamp 1.
Ninja will notice that the timestamp is the same as at the start of
the build, and will mark output.inc as clean.  This is propagated
through to outputuser.o, which is also marked clean.  So no further
rebuilding is needed.  Ninja will also associate the timestamp 3 with
output.inc in the build log.

Suppose that Ninja is invoked again immediately afterwards.  The build
planner compares output.inc's timestamp in the build log, 3, against
the modification time of output.inc.tmp, also 3, so it is marked
as clean. It then compares output.inc's actual modification time,
1, against outputuser.o's modification time, also 1, so it is also
marked as clean, and this is a no-op build.

There's a small UI issue here in that the total number of files to be
rebuilt is unknown during the course of a rebuild until all restat
edges are done.  There are a number of options for presenting the
build status until this happens.  I can think of:

1) Show the maximum number of files that could be rebuilt at the
  current time, and allow this number to drop.
  1a) Same as 1, but prioritise the restat edges in order to show a
      correct total to the user as quickly as possible.
2) Keep the total constant, and treat any skipped outputs as completed.
3) Display a question mark in place of the total until all restat
  edges are done.
  3a) Same as 3, but prioritise the restat edges.

I'm leaning towards 1 at the moment, unless the prioritisation turns
out to be easy, in which case I'd go for 3a.

Thanks for reading... thoughts?

Thanks,
--
Peter



-- 
+1 919 869 8849
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.cmake.org/pipermail/cmake/attachments/20110907/0f2ed0a4/attachment.htm>


More information about the CMake mailing list