confusing mcron logging

  • Done
  • quality assurance status badge
Details
4 participants
  • Ludovic Courtès
  • Maxim Cournoyer
  • Dale Mellor
  • Robert Vollmert
Owner
unassigned
Submitted by
Robert Vollmert
Severity
normal
R
R
Robert Vollmert wrote on 5 Jul 2019 15:35
(address . bug-guix@gnu.org)
90FD0C85-F140-420C-AD90-3C2776D8B8D0@vllmrt.net
I have two mcron jobs on my system, certbot renewal and
a handwritten and currently buggy guile job. This is an
excerpt from /var/log/mcron.log:

Toggle quote (2 lines)
>>>>>

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Cert not yet due for renewal
Keeping the existing certificate

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Certificate not yet due for renewal; no action taken.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Acquiring or renewing certificate: garp.vllmrt.net
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Cert not yet due for renewal
Keeping the existing certificate

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Certificate not yet due for renewal; no action taken.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Acquiring or renewing certificate: garp.vllmrt.net
Backtrace:
9 (apply-smob/1 #<catch-closure 5cf300>)
In ice-9/boot-9.scm:
829:9 8 (catch mcron-error #<procedure 7fe67c318d28 at mcron/s?> ?)
In mcron/scripts/mcron.scm:
99:7 7 (_)
In mcron/base.scm:
234:12 6 (_ #<continuation 5ad660>)
In srfi/srfi-1.scm:
640:9 5 (for-each #<procedure run-job (job)> (#<<job> user: #(?>))
In mcron/base.scm:
186:10 4 (run-job #<<job> user: #("root" "x" 0 0 "System adminis?>)
In ice-9/eval.scm:
293:34 3 (_ #(#(#<directory (mcron scripts mcron) 6a9c80>)))
182:19 2 (proc #(#(#<directory (mcron scripts mcron) 6a9c80>)))
142:16 1 (compile-top-call _ (7 . get-string-all) ((10 (# . #) ?)))
In unknown file:
0 (%resolve-variable (7 . get-string-all) #<directory (mc?>)

ERROR: In procedure %resolve-variable:
Unbound variable: get-string-all
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Cert not yet due for renewal
Keeping the existing certificate

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Certificate not yet due for renewal; no action taken.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Acquiring or renewing certificate: garp.vllmrt.net
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Cert not yet due for renewal
Keeping the existing certificate

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Certificate not yet due for renewal; no action taken.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Acquiring or renewing certificate: garp.vllmrt.net

<<<<<

It’s impossible to tell what output is from which job; which jobs succeeded or
didn’t; when they ran.

Suggestions:
- mcron should log the timestamp and a job id of every job when it starts
- mcron should log the timestamp and status and job id of every job when it finishes
- job output should be prefixed by some job id
L
L
Ludovic Courtès wrote on 5 Jul 2019 22:37
(name . Robert Vollmert)(address . rob@vllmrt.net)(address . 36510@debbugs.gnu.org)
87r274jjyy.fsf@gnu.org
Hi,

Robert Vollmert <rob@vllmrt.net> skribis:

Toggle quote (5 lines)
> Suggestions:
> - mcron should log the timestamp and a job id of every job when it starts
> - mcron should log the timestamp and status and job id of every job when it finishes
> - job output should be prefixed by some job id

+1! +3 even. :-)

Something that can help debugging to some extent (but is definitely no
substitute for what you suggest above!) is ‘sudo herd schedule mcron’.
I use that to manually run jobs that appear not to work as expected.

Ludo’.
R
R
Robert Vollmert wrote on 5 Jul 2019 22:48
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 36510@debbugs.gnu.org)
4E575636-A350-48C0-A20F-6DE7AEB2D5AF@vllmrt.net
Toggle quote (5 lines)
> On 5. Jul 2019, at 22:37, Ludovic Courtès <ludo@gnu.org> wrote:
> Something that can help debugging to some extent (but is definitely no
> substitute for what you suggest above!) is ‘sudo herd schedule mcron’.
> I use that to manually run jobs that appear not to work as expected.

That only works for non-guile jobs though as far as I understand, where
'herd schedule mcron' prints a store path.

M
M
Maxim Cournoyer wrote on 18 Aug 2021 02:53
(name . Ludovic Courtès)(address . ludo@gnu.org)
87mtpfy3o0.fsf@gmail.com
Hello Robert and Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (11 lines)
> Hi,
>
> Robert Vollmert <rob@vllmrt.net> skribis:
>
>> Suggestions:
>> - mcron should log the timestamp and a job id of every job when it starts
>> - mcron should log the timestamp and status and job id of every job when it finishes
>> - job output should be prefixed by some job id
>
> +1! +3 even. :-)

I've sent a patch upstream that implements all of the above [0]. I've
been using it on my system, it works well so far! I'm also keeping this
work in a public Notabug git repo [1].

Hopefully it gets merged and Guix System can reap the benefits :-).

Thanks for the suggestions!

Maxim

M
L
L
Ludovic Courtès wrote on 30 Aug 2021 11:49
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87eeabdzyc.fsf@gnu.org
Hello Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (17 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi,
>>
>> Robert Vollmert <rob@vllmrt.net> skribis:
>>
>>> Suggestions:
>>> - mcron should log the timestamp and a job id of every job when it starts
>>> - mcron should log the timestamp and status and job id of every job when it finishes
>>> - job output should be prefixed by some job id
>>
>> +1! +3 even. :-)
>
> I've sent a patch upstream that implements all of the above [0]. I've
> been using it on my system, it works well so far! I'm also keeping this
> work in a public Notabug git repo [1].

That’s a much welcome improvement, thank you!

Ludo’.
D
D
Dale Mellor wrote on 4 Jan 2022 14:21
Re: [PATCH v3] base: Annotate output with job information.
0b026b7cd95151875bf47958fc70b52764816d71.camel@rdmp.org
Hi, sorry for the delay but I've had a bit of time over Christmas
to look things over. I've given this a lot of consideration.


I am happy to drop compatibility with guile-2.2 and older; I
think we can make a minor version bump for this break with
legacy.



Does this belong in mcron? The mcron source code is currently
3,000 lines, to which you are bringing over 500 new ones to
make a facility which is geared towards debugging in the GUIX
system (I am all-in on GUIX myself, but mcron is a generic GNU
program with use-cases outside of this system). I wonder if
this is the best place: perhaps it is shepherd, which is
responsible for the /var/log/mcron.log file, to be responsible
for the amended logging messages? And then again, isn't this
exactly what syslogd does anyway? Most likely timings will be
more accurate if they are generated in mcron.

In your use-case, of debugging the system, I would think that
more specialized messages placed directly in the cron jobs
themselves would be a better aid to your work, as you can
target them to the problem at hand. And you could send those
to syslogd if you wanted.



The output is a little unpredictable. The script (which is
admittedly somewhat pathological)

(job '(next-second '(0 30)) '(begin (display "test: ")
(system "date")))

produces

2022-01-04T11:24:00 (...): running...
2022-01-04T11:24:00 (...): Tue 4 Jan 11:24:00 GMT 2022
2022-01-04T11:24:00 (...): test: completed in 0.022s
2022-01-04T11:24:30 (...): running...
2022-01-04T11:24:30 (...): Tue 4 Jan 11:24:30 GMT 2022
2022-01-04T11:25:00 (...): running...
2022-01-04T11:25:00 (...): Tue 4 Jan 11:25:00 GMT 2022
...



But all things considered your changes are generally useful to
have, including outside of the GUIX system, and I would very
much like to have them there. But to be sure not to break any
existing applications, I would like the changes to be opt-in
via a command-line switch -l; the --log-format option can
remain to customize this (please also make -L a short option
alternative; also -D as short for --date-format).

I am willing and able to do this work myself in a reasonable
time-frame if you would like me to.



Best wishes, Dale
M
M
Maxim Cournoyer wrote on 21 Nov 2022 02:22
Re: bug#36510: confusing mcron logging
(name . Dale Mellor)(address . mcron-lsfnyl@rdmp.org)(address . 36510-done@debbugs.gnu.org)
874jut2jam.fsf_-_@gmail.com
Hello Dale,

Dale Mellor <mcron-lsfnyl@rdmp.org> writes:

Toggle quote (3 lines)
> Hi, sorry for the delay but I've had a bit of time over Christmas
> to look things over. I've given this a lot of consideration.

Apologies for my lack of reply thus far, it seems your mail had fallen
in cracks.

Toggle quote (17 lines)
>
> I am happy to drop compatibility with guile-2.2 and older; I think we
> can make a minor version bump for this break with legacy.
>
>
>
> Does this belong in mcron? The mcron source code is currently
> 3,000 lines, to which you are bringing over 500 new ones to
> make a facility which is geared towards debugging in the GUIX
> system (I am all-in on GUIX myself, but mcron is a generic GNU
> program with use-cases outside of this system). I wonder if
> this is the best place: perhaps it is shepherd, which is
> responsible for the /var/log/mcron.log file, to be responsible
> for the amended logging messages? And then again, isn't this
> exactly what syslogd does anyway? Most likely timings will be
> more accurate if they are generated in mcron.

Since Shepherd 0.9+, it now appends logging information to every output
it handles, so this feature has indeed become less important, but still
useful: I've recently bumped our package of mcron in Guix and I'm using
its annotation facility to prepend the process ID to its output. I
think the grunt of new lines added must be as documentation and test
code, so that's not so bad as it seems I think.

Toggle quote (6 lines)
> In your use-case, of debugging the system, I would think that
> more specialized messages placed directly in the cron jobs
> themselves would be a better aid to your work, as you can
> target them to the problem at hand. And you could send those
> to syslogd if you wanted.

Here's a sample output from the Guix build farm:

Toggle snippet (9 lines)
2022-11-21 01:56:15 84005 /gnu/store/ypyz886hd7qaw0g8ba5a595dc0qgnj3q-update-guix.gnu.org: running...
2022-11-21 01:59:24 84005 /gnu/store/ypyz886hd7qaw0g8ba5a595dc0qgnj3q-update-guix.gnu.org: Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
2022-11-21 01:59:24 84005 /gnu/store/ypyz886hd7qaw0g8ba5a595dc0qgnj3q-update-guix.gnu.org: Computing Guix derivation for 'x86_64-linux'...
2022-11-21 01:59:24 84005 /gnu/store/ypyz886hd7qaw0g8ba5a595dc0qgnj3q-update-guix.gnu.org: [2022-11-21T01:56:18+0100] building web site from 'https://git.savannah.gnu.org/git/guix/guix-artwork.git'...
2022-11-21 01:59:24 84005 /gnu/store/ypyz886hd7qaw0g8ba5a595dc0qgnj3q-update-guix.gnu.org: completed in 189.325s
2022-11-21 02:00:00 91665 /gnu/store/xsc4x68avp8nmrf3hgvhd26yl3k90jqz-check-disk-space: running...
2022-11-21 02:00:00 91665 /gnu/store/xsc4x68avp8nmrf3hgvhd26yl3k90jqz-check-disk-space: completed in 0.046s

The timestamp is now generated by Shepherd, and mcron adds the PID of
the job, such as 84005 above. To have some indication of how long the
job ran available at a quick glance is very useful for admin purposes.

Toggle quote (19 lines)
>
>
> The output is a little unpredictable. The script (which is
> admittedly somewhat pathological)
>
> (job '(next-second '(0 30)) '(begin (display "test: ")
> (system "date")))
>
> produces
>
> 2022-01-04T11:24:00 (...): running...
> 2022-01-04T11:24:00 (...): Tue 4 Jan 11:24:00 GMT 2022
> 2022-01-04T11:24:00 (...): test: completed in 0.022s
> 2022-01-04T11:24:30 (...): running...
> 2022-01-04T11:24:30 (...): Tue 4 Jan 11:24:30 GMT 2022
> 2022-01-04T11:25:00 (...): running...
> 2022-01-04T11:25:00 (...): Tue 4 Jan 11:25:00 GMT 2022
> ...

I've noticed that too, that some jobs somehow escape producing the
"completed in x..." message. I'll try looking into that, it's probably
a subtle bug.

Toggle quote (11 lines)
> But all things considered your changes are generally useful to
> have, including outside of the GUIX system, and I would very
> much like to have them there. But to be sure not to break any
> existing applications, I would like the changes to be opt-in
> via a command-line switch -l; the --log-format option can
> remain to customize this (please also make -L a short option
> alternative; also -D as short for --date-format).
>
> I am willing and able to do this work myself in a reasonable
> time-frame if you would like me to.

Thank you for taking on yourself the above work, Dale! I was happily
surprise to see this change had landed with your improvement on top.

I think this Guix issue can now be closed :-).

--
Thanks,
Maxim
Closed
M
M
Maxim Cournoyer wrote on 29 Nov 2022 04:31
(name . Dale Mellor)(address . mcron-lsfnyl@rdmp.org)(address . 36510@debbugs.gnu.org)
87o7sqqvvp.fsf@gmail.com
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (2 lines)
> Dale Mellor <mcron-lsfnyl@rdmp.org> writes:

[...]

Toggle quote (17 lines)
>> The output is a little unpredictable. The script (which is
>> admittedly somewhat pathological)
>>
>> (job '(next-second '(0 30)) '(begin (display "test: ")
>> (system "date")))
>>
>> produces
>>
>> 2022-01-04T11:24:00 (...): running...
>> 2022-01-04T11:24:00 (...): Tue 4 Jan 11:24:00 GMT 2022
>> 2022-01-04T11:24:00 (...): test: completed in 0.022s
>> 2022-01-04T11:24:30 (...): running...
>> 2022-01-04T11:24:30 (...): Tue 4 Jan 11:24:30 GMT 2022
>> 2022-01-04T11:25:00 (...): running...
>> 2022-01-04T11:25:00 (...): Tue 4 Jan 11:25:00 GMT 2022
>> ...

I tried reproducing this, but couldn't, using the latest GNU Shepherd as
shipped in Guix.

Toggle quote (4 lines)
> I've noticed that too, that some jobs somehow escape producing the
> "completed in x..." message. I'll try looking into that, it's probably
> a subtle bug.

I took some time looking at the issue, and it was more straightforward
than I had hoped: I was using exec in my job, which was basically
hijacking the mcron's forked job process and loosing what it would have
normally done upon completion (print status). Turning the 'execl' calls
into 'system*' fixed it:

Toggle snippet (45 lines)
modified guix/hurd.scm
@@ -36,14 +36,14 @@
;; Run 'updatedb' at 3AM every day.
#~(job '(next-hour '(3))
(lambda ()
- (execl #$(file-append findutils "/bin/updatedb") "updatedb"
- (string-append "--prunepaths="
- "/gnu/store "
- "/media "
- "/mnt "
- "/tmp "
- "/var/tmp "
- "/var/lib ")))
+ (system* #$(file-append findutils "/bin/updatedb")
+ (string-append "--prunepaths="
+ "/gnu/store "
+ "/media "
+ "/mnt "
+ "/tmp "
+ "/var/tmp "
+ "/var/lib ")))
"updatedb"))
(define btrfs-balance-job
@@ -52,15 +52,15 @@
;; low (5%) to minimize wear on the SSD. Runs at 5 AM every 3 days.
#~(job '(next-hour-from (next-day (range 1 31 3)) '(5))
(lambda ()
- (execl #$(file-append btrfs-progs "/bin/btrfs") "btrfs"
- "balance" "start" "-dusage=5" "/"))
+ (system* #$(file-append btrfs-progs "/bin/btrfs")
+ "balance" "start" "-dusage=5" "/"))
"btrfs-balance"))
(define btrbk-job
#~(job '(next-hour)
(lambda ()
- (execl #$(file-append btrbk "/bin/btrbk") "btrbk"
- "-q" "-c" #$(local-file "btrbk.conf") "run"))
+ (system* #$(file-append btrbk "/bin/btrbk")
+ "-q" "-c" #$(local-file "btrbk.conf") "run"))
"btrbk"))

--
Thanks,
Maxim
?