<br><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Peter Collingbourne</b> <span dir="ltr"><<a href="mailto:peter@pcc.me.uk">peter@pcc.me.uk</a>></span><br>Date: Wed, Sep 7, 2011 at 9:17 PM<br>
Subject: Proposal: restat rules<br>To: <a href="mailto:ninja-build@googlegroups.com">ninja-build@googlegroups.com</a><br><br><br>Hi,<br>
<br>
In this email I'll try to explain one of the oddities of make (which<br>
some CMake-based build systems rely on), and why we can't currently<br>
express this in Ninja. I'll also propose how we could extend Ninja<br>
to support this behaviour.<br>
<br>
(Warning, long essay ahead.)<br>
<br>
In the LLVM project we maintain a code generator called tblgen,<br>
the purpose of which is to generate a number of header files<br>
containing various metadata. From time to time, tblgen and its<br>
dependent libraries will be modified, causing a rebuild of tblgen.<br>
Strictly speaking, we should now rebuild the header files, and all of<br>
their reverse dependencies, even if the generated file did not change.<br>
This is suboptimal, because the reverse dependencies constitute<br>
every object file built from a source file that transitively includes<br>
one of the tblgen generated files (which is >50% of object files in<br>
the build).<br>
<br>
To avoid this problem, we cause tblgen to write its output to a<br>
temporary file, and use a utility to copy the temporary file over the<br>
target file only if the temporary file is different from the target.<br>
In makefile terms, it looks something like this:<br>
<br>
-----<br>
all: outputuser.o<br>
<br>
tblgen: tblgen.cpp<br>
touch tblgen<br>
<br>
output.inc.tmp: tblgen<br>
touch output.inc.tmp<br>
<br>
output.inc: output.inc.tmp<br>
if cmp output.inc.tmp output.inc ; then : ; else cp output.inc.tmp output.inc ; fi<br>
<br>
outputuser.o: outputuser.cpp output.inc<br>
touch outputuser.o<br>
-----<br>
<br>
Note what happens during an incremental build where tblgen is the<br>
only dirty file, but its output file output.inc.tmp does not change<br>
relative to output.inc. make initially schedules a rebuild of tblgen,<br>
output.inc.tmp, output.inc and outputuser.o. After output.inc has<br>
been rebuilt, its timestamp remains the same as before the build.<br>
Before make begins to rebuild outputuser.o, it will re-evaluate the<br>
dirty state of outputuser.o based on the timestamps of its inputs<br>
(i.e. it will stat them again). Because outputuser.cpp and output.inc<br>
are both older than outputuser.o, make doesn't rebuild it after all,<br>
despite it being initially scheduled for a build.<br>
<br>
The behaviour is different in Ninja, which operates in two phases:<br>
the scheduling of the build and the build itself. Like make, Ninja<br>
will schedule a rebuild of tblgen, output.inc.tmp, output.inc and<br>
outputuser.o. Unlike make, it will rebuild targetuser.o, because<br>
it does not re-stat inputs during the build. The key observation<br>
here is that unlike make, Ninja currently provides no mechanism for<br>
pruning the scheduled build graph during a build using a build rule.<br>
<br>
What I propose for Ninja is that we implement this pruning behaviour<br>
in a similar way to make, but only for specific rules with a special<br>
variable set on the rule. We can call this variable "restat"<br>
(suggestions for better names are welcome). If this variable is<br>
present on a rule, Ninja will, after executing the rule command,<br>
re-stat each output file to obtain its modification time. If the<br>
modification time is unchanged from when Ninja initially stat'ed the<br>
file before starting the build, Ninja will mark that output file as<br>
clean, and recursively for each reverse dependency of the output file,<br>
recompute its dirty status.<br>
<br>
As an improvement over what make does, Ninja then stores the current<br>
timestamp in the build log entry associated with the output file.<br>
This timestamp will be treated by future invocations of Ninja as the<br>
output file's modification time instead of the output file's actual<br>
modification time for the purpose of deciding whether it is dirty<br>
(but not whether its reverse dependencies are dirty).<br>
<br>
To give an example of how this would look, here is the above makefile<br>
translated to Ninja:<br>
<br>
-----<br>
rule touch<br>
command = touch $out<br>
<br>
rule cpifdiff<br>
command = if cmp $in $out ; then : ; else cp $in $out ; fi<br>
restat = true<br>
<br>
build tblgen: touch tblgen.cpp<br>
build output.inc.tmp: touch tblgen<br>
build output.inc: cpifdiff output.inc.tmp<br>
build outputuser.o: touch outputuser.cpp output.inc<br>
<br>
default outputuser.o<br>
-----<br>
<br>
Now consider what happens when Ninja is asked at timestamp 3 to rebuild<br>
the default target (outputuser.o) where tblgen.cpp has timestamp 2 and<br>
all other files have timestamp 1. Ninja will schedule a rebuild of<br>
tblgen, output.inc.tmp, output.inc and outputuser.o. Again, suppose<br>
that the contents of output.inc.tmp are equal to output.inc when built.<br>
So after output.inc has been rebuilt, it still has a timestamp 1.<br>
Ninja will notice that the timestamp is the same as at the start of<br>
the build, and will mark output.inc as clean. This is propagated<br>
through to outputuser.o, which is also marked clean. So no further<br>
rebuilding is needed. Ninja will also associate the timestamp 3 with<br>
output.inc in the build log.<br>
<br>
Suppose that Ninja is invoked again immediately afterwards. The build<br>
planner compares output.inc's timestamp in the build log, 3, against<br>
the modification time of output.inc.tmp, also 3, so it is marked<br>
as clean. It then compares output.inc's actual modification time,<br>
1, against outputuser.o's modification time, also 1, so it is also<br>
marked as clean, and this is a no-op build.<br>
<br>
There's a small UI issue here in that the total number of files to be<br>
rebuilt is unknown during the course of a rebuild until all restat<br>
edges are done. There are a number of options for presenting the<br>
build status until this happens. I can think of:<br>
<br>
1) Show the maximum number of files that could be rebuilt at the<br>
current time, and allow this number to drop.<br>
1a) Same as 1, but prioritise the restat edges in order to show a<br>
correct total to the user as quickly as possible.<br>
2) Keep the total constant, and treat any skipped outputs as completed.<br>
3) Display a question mark in place of the total until all restat<br>
edges are done.<br>
3a) Same as 3, but prioritise the restat edges.<br>
<br>
I'm leaning towards 1 at the moment, unless the prioritisation turns<br>
out to be easy, in which case I'd go for 3a.<br>
<br>
Thanks for reading... thoughts?<br>
<br>
Thanks,<br>
--<br>
<font color="#888888">Peter<br>
</font></div><br><br clear="all"><br>-- <br>+1 919 869 8849<br><br>