Rust builds systematically time out

  • Done
  • quality assurance status badge
Details
8 participants
  • Ivan Petkov
  • John Soo
  • Ludovic Courtès
  • Maxim Cournoyer
  • mikadoZero
  • Mathieu Othacehe
  • Pierre Langlois
  • zimoun
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
important
L
L
Ludovic Courtès wrote on 4 Apr 2019 10:59
(address . bug-Guix@gnu.org)
878swqtabb.fsf@gnu.org
Hello,

On berlin, Rust 1.24.1 builds systematically exceed the timeout:

Toggle snippet (15 lines)
Building stage1 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Compiling arena v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/libarena)
Compiling rustc_driver v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_driver)

[...]

Compiling rls-data v0.14.0
Compiling rustc_data_structures v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_data_structures)
Compiling flate2 v1.0.1
Compiling syntax_pos v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/libsyntax_pos)
Compiling rustc_errors v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_errors)
Compiling backtrace v0.3.4
guix offload: error: timeout expired while offloading '/gnu/store/61bd22d9mg3xl260jwddisiahh3kmanj-rust-1.24.1.drv'

Strangely, the build lasts ~9000 seconds (2.5 hours) on the front-end
node of berlin¹, and the timeout for guix-daemon on berlin is 6h (see
guix-maintenance.git) while the max-silent-time is 1h.

The build nodes may be slower than the front-end, but still, it seems
unlikely that it would take more than 6h there. (That could happen if
the test suite, which lasts 2.1h, were “embarrassingly parallel”, but
we’re running tests with ‘-j1’.)

To summarize, there are two problems:

1. Rust takes too long to build. What can we do about it? Enable
parallel builds?

2. Offloaded builds seem to time out prematurely or something.

Thoughts?

Ludo’.

for timings.
P
P
Pierre Langlois wrote on 4 Apr 2019 11:28
(address . bug-guix@gnu.org)(name . Ivan Petkov)(address . ivanppetkov@gmail.com)
87bm1mglus.fsf@gmx.com
Hello!

Ludovic Courtès writes:

Toggle quote (34 lines)
> Hello,
>
> On berlin, Rust 1.24.1 builds systematically exceed the timeout:
>
> --8<---------------cut here---------------start------------->8---
> Building stage1 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
> Compiling arena v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/libarena)
> Compiling rustc_driver v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_driver)
>
> [...]
>
> Compiling rls-data v0.14.0
> Compiling rustc_data_structures v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_data_structures)
> Compiling flate2 v1.0.1
> Compiling syntax_pos v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/libsyntax_pos)
> Compiling rustc_errors v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_errors)
> Compiling backtrace v0.3.4
> guix offload: error: timeout expired while offloading '/gnu/store/61bd22d9mg3xl260jwddisiahh3kmanj-rust-1.24.1.drv'
> --8<---------------cut here---------------end--------------->8---
>
> Strangely, the build lasts ~9000 seconds (2.5 hours) on the front-end
> node of berlin¹, and the timeout for guix-daemon on berlin is 6h (see
> guix-maintenance.git) while the max-silent-time is 1h.
>
> The build nodes may be slower than the front-end, but still, it seems
> unlikely that it would take more than 6h there. (That could happen if
> the test suite, which lasts 2.1h, were “embarrassingly parallel”, but
> we’re running tests with ‘-j1’.)
>
> To summarize, there are two problems:
>
> 1. Rust takes too long to build. What can we do about it? Enable
> parallel builds?

One thing I suggested in the past was to remove the check phase *only*
for rust packages used for bootstrapping. This way we still run the
tests for the final rust but not at every step in the chain.

Although, I wonder if we're more likely to miss a bug if we do this, I'm
not sure.


Thanks,
Pierre
L
L
Ludovic Courtès wrote on 4 Apr 2019 13:24
control message for bug #35139
(address . control@debbugs.gnu.org)
87y34qrp14.fsf@gnu.org
severity 35139 important
I
I
Ivan Petkov wrote on 4 Apr 2019 17:47
Re: bug#35139: Rust builds systematically time out
(address . bug-guix@gnu.org)
101FBDE5-97FA-4449-9076-DD24C56B8715@gmail.com
Toggle quote (12 lines)
> On Apr 4, 2019, at 1:59 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>
> The build nodes may be slower than the front-end, but still, it seems
> unlikely that it would take more than 6h there. (That could happen if
> the test suite, which lasts 2.1h, were “embarrassingly parallel”, but
> we’re running tests with ‘-j1’.)
>
> To summarize, there are two problems:
>
> 1. Rust takes too long to build. What can we do about it? Enable
> parallel builds?

Rust tests are designed to run in parallel, as long as you have enough
RAM, file descriptors, etc. available on the machine for the amount of
concurrency being used. The compiler test suite is largely just compiling
files, so the most important resource is probably available RAM/swap.

Toggle quote (9 lines)
> On Apr 4, 2019, at 2:28 AM, Pierre Langlois <pierre.langlois@gmx.com> wrote:
>
> One thing I suggested in the past was to remove the check phase *only*
> for rust packages used for bootstrapping. This way we still run the
> tests for the final rust but not at every step in the chain.
>
> Although, I wonder if we're more likely to miss a bug if we do this, I'm
> not sure.

Although that definitely will speed the bootstrap chain, I’m concerned that
if a dependency package ever gets updated and breaks things we wouldn’t
know without running the test suite.

Maybe if the bootstrapped versions don’t ever change skipping the check
phase will be safe, but I think we should try running parallel tests first
and see how far that gets us.

—Ivan
Attachment: file
L
L
Ludovic Courtès wrote on 4 Apr 2019 18:06
(name . Ivan Petkov)(address . ivanppetkov@gmail.com)
87imvtrc0g.fsf@gnu.org
Ivan Petkov <ivanppetkov@gmail.com> skribis:

Toggle quote (17 lines)
>> On Apr 4, 2019, at 1:59 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>
>> The build nodes may be slower than the front-end, but still, it seems
>> unlikely that it would take more than 6h there. (That could happen if
>> the test suite, which lasts 2.1h, were “embarrassingly parallel”, but
>> we’re running tests with ‘-j1’.)
>>
>> To summarize, there are two problems:
>>
>> 1. Rust takes too long to build. What can we do about it? Enable
>> parallel builds?
>
> Rust tests are designed to run in parallel, as long as you have enough
> RAM, file descriptors, etc. available on the machine for the amount of
> concurrency being used. The compiler test suite is largely just compiling
> files, so the most important resource is probably available RAM/swap.

Perhaps we could start with:

"-j" (number->string (min (parallel-job-count) 2))

?

Toggle quote (4 lines)
> Maybe if the bootstrapped versions don’t ever change skipping the check
> phase will be safe, but I think we should try running parallel tests first
> and see how far that gets us.

Sounds like a good start.

So the only reason we’re running tests sequentially is because of memory
usage concerns?

Thanks,
Ludo’.
I
I
Ivan Petkov wrote on 4 Apr 2019 19:37
(name . Ludovic Courtès)(address . ludo@gnu.org)
17B412D1-5D9A-40C8-B37E-D8C08F0E9641@gmail.com
Danny’s got a patch for turning on parallel tests in #35126

Not sure why the previous tests were running sequentially, but there is a comment somewhere saying it’s to avoid EAGAIN errors.

--Ivan

Toggle quote (38 lines)
> On Apr 4, 2019, at 9:06 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>
> Ivan Petkov <ivanppetkov@gmail.com> skribis:
>
>>> On Apr 4, 2019, at 1:59 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>>
>>> The build nodes may be slower than the front-end, but still, it seems
>>> unlikely that it would take more than 6h there. (That could happen if
>>> the test suite, which lasts 2.1h, were “embarrassingly parallel”, but
>>> we’re running tests with ‘-j1’.)
>>>
>>> To summarize, there are two problems:
>>>
>>> 1. Rust takes too long to build. What can we do about it? Enable
>>> parallel builds?
>>
>> Rust tests are designed to run in parallel, as long as you have enough
>> RAM, file descriptors, etc. available on the machine for the amount of
>> concurrency being used. The compiler test suite is largely just compiling
>> files, so the most important resource is probably available RAM/swap.
>
> Perhaps we could start with:
>
> "-j" (number->string (min (parallel-job-count) 2))
>
> ?
>
>> Maybe if the bootstrapped versions don’t ever change skipping the check
>> phase will be safe, but I think we should try running parallel tests first
>> and see how far that gets us.
>
> Sounds like a good start.
>
> So the only reason we’re running tests sequentially is because of memory
> usage concerns?
>
> Thanks,
> Ludo’.
Attachment: file
M
M
mikadoZero wrote on 5 Apr 2019 23:18
(name . Ludovic Courtès)(address . ludo@gnu.org)
cucpnq040ck.fsf@yandex.com
When I try to install rust I get similar behavior. It does not finish
building. The longest I have let it try for was around 12 hours. That
was is a on a machine with 1GB RAM and 10GB SWAP.

Ludovic Courtès writes:

Toggle quote (42 lines)
> Hello,
>
> On berlin, Rust 1.24.1 builds systematically exceed the timeout:
>
> --8<---------------cut here---------------start------------->8---
> Building stage1 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
> Compiling arena v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/libarena)
> Compiling rustc_driver v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_driver)
>
> [...]
>
> Compiling rls-data v0.14.0
> Compiling rustc_data_structures v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_data_structures)
> Compiling flate2 v1.0.1
> Compiling syntax_pos v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/libsyntax_pos)
> Compiling rustc_errors v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_errors)
> Compiling backtrace v0.3.4
> guix offload: error: timeout expired while offloading '/gnu/store/61bd22d9mg3xl260jwddisiahh3kmanj-rust-1.24.1.drv'
> --8<---------------cut here---------------end--------------->8---
>
> Strangely, the build lasts ~9000 seconds (2.5 hours) on the front-end
> node of berlin¹, and the timeout for guix-daemon on berlin is 6h (see
> guix-maintenance.git) while the max-silent-time is 1h.
>
> The build nodes may be slower than the front-end, but still, it seems
> unlikely that it would take more than 6h there. (That could happen if
> the test suite, which lasts 2.1h, were “embarrassingly parallel”, but
> we’re running tests with ‘-j1’.)
>
> To summarize, there are two problems:
>
> 1. Rust takes too long to build. What can we do about it? Enable
> parallel builds?
>
> 2. Offloaded builds seem to time out prematurely or something.
>
> Thoughts?
>
> Ludo’.
>
> ¹ See <https://ci.guix.info/log/rkrnm3rr7g6fhr17160vn1mz5rdzh9lv-rust-1.24.1>
> for timings.
J
J
John Soo wrote on 30 Mar 2020 07:42
Rust builds systematically time out
(address . 35139@debbugs.gnu.org)
119E152C-F6C8-4CB1-86C1-F213EB4571C0@asu.edu
Hi everyone,

Is this still happening? It looks like rust-1.24.1 is completing successfully on both ci servers.

- John
M
M
Mathieu Othacehe wrote on 18 Dec 2020 11:29
(name . Ludovic Courtès)(address . ludo@gnu.org)
87a6ubl84x.fsf@gnu.org
Hello,

Toggle quote (3 lines)
>>> 1. Rust takes too long to build. What can we do about it? Enable
>>> parallel builds?

I've noticed that Rust packages are also built with "-j1". Evaluations
such as: https://ci.guix.gnu.org/eval/19873are causing rebuilds of many
Rust packages, hence monopolizing the build farm for hours.

Would it be possible to enable parallel building for Rust packages as
suggested by Ludo in this thread?

Thanks,

Mathieu
Z
Z
zimoun wrote on 18 Dec 2020 11:45
868s9vmlyq.fsf@gmail.com
Hi Mathieu,

On Fri, 18 Dec 2020 at 11:29, Mathieu Othacehe <othacehe@gnu.org> wrote:
Toggle quote (12 lines)
> Hello,
>
>>>> 1. Rust takes too long to build. What can we do about it? Enable
>>>> parallel builds?
>
> I've noticed that Rust packages are also built with "-j1". Evaluations
> such as: https://ci.guix.gnu.org/eval/19873 are causing rebuilds of many
> Rust packages, hence monopolizing the build farm for hours.
>
> Would it be possible to enable parallel building for Rust packages as
> suggested by Ludo in this thread?

Does the parallel builds build reproductibly?

All the best,
simon
M
M
Maxim Cournoyer wrote on 21 Nov 2021 07:09
(name . Ludovic Courtès)(address . ludo@gnu.org)
87pmquca5o.fsf@gmail.com
Hello,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (22 lines)
> Hello,
>
> On berlin, Rust 1.24.1 builds systematically exceed the timeout:
>
> Building stage1 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
> Compiling arena v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/libarena)
> Compiling rustc_driver v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_driver)
>
> [...]
>
> Compiling rls-data v0.14.0
> Compiling rustc_data_structures v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_data_structures)
> Compiling flate2 v1.0.1
> Compiling syntax_pos v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/libsyntax_pos)
> Compiling rustc_errors v0.0.0 (file:///tmp/guix-build-rust-1.24.1.drv-0/rustc-1.24.1-src/src/librustc_errors)
> Compiling backtrace v0.3.4
> guix offload: error: timeout expired while offloading '/gnu/store/61bd22d9mg3xl260jwddisiahh3kmanj-rust-1.24.1.drv'
>
> Strangely, the build lasts ~9000 seconds (2.5 hours) on the front-end
> node of berlin¹, and the timeout for guix-daemon on berlin is 6h (see
> guix-maintenance.git) while the max-silent-time is 1h.

With the recent improvement in the Rust bootstrap toolchains, I'm
considering this fixed.

If there are still timeouts, Cuirass is now supposed to honor the
'timeout' property.

Closing!

Maxim
Closed
?