[IRCServices] Log Rotation and -SIGUSR2
Mark Hetherington
mark at mhetherington.demon.co.uk
Fri Dec 14 03:15:03 PST 2001
I was not a subscriber to the list at the time the rotate_log() function was
removed so missed the discussion. I haven't yet come across it in the
archives. For these reasons, some of what I say here may have come up
before.
kill -SIGUSR2 combined with a move of the log file does not seem to be the
best manner in which to manage the log rotation, at least IME. Quite often,
the sequence will result in services performing an SQUIT with an error
message of:
Read error from server: Success
to it's log file.
So far I have been unable to discover the exact reason what services thinks
is an error especially given the lack of error indicated with the error
message.
Various combinations of file operations and the SIGUSR signal have been
tried with no noticable change.
So far I have been unable through existing debug code been able to ascertain
the exact cause of the problem however I suspect that it is a timing issue
with services responding to the signal at the wrong point during the file
system operations (maybe the file operation is still in progress, maybe file
size has some effect on this).
Since the problem is only semi-regularly reproducable on the production
network, I am limited as to how often I can debug the problem and I suspect
that I would be better placed merely rewriting the log/signal processing to
use a system that would not suffer these problems. Sometimes services will
run for a couple months without a problem then other times it will have the
problem every day. Again this is not helpful for attempting to debug the
cause.
Having had a quick look at the source code for Services 5 alpha, it seems
the SIGUSR2/logging code is almost identical so this will likely remain an
issue with newer versions of services.
Although I would be interested in hearing other's experience and techniques
for managing log rotation, I do have a couple of suggestions that I believe
would be useful to have in the main build of services:
1) Restore the original rotate_log() function or provide a new one - using a
compile time #define would mean that those that do no want to have the code
compiled in do not have to. Yes I can provide my own rotate_log() code, but
with Services 5 coming up and the likely need for more frequent upgrades in
the early days, I would prefer not to have to keep patching the log code.
2) Alter the log naming convention so that it becomes LogFileName.suffix
where suffix is generated from the current date. eg 20011213. This would
allow the existing SIGUSR2 code to actually produce dated log files in a
similar way to eggdrop and some IRCd software does in their own rotate log
functions while making the whole system somewaht less susceptable to any
other issues. Using eggdrop as an example, it is again a simple
configuration item to have this system optional for any users of services
that do not use daily logs.
3) The error system needs some improvement. "Read error from server:
Success" is not a useful error message. The nature of the error needs
reporting.
Should I manage to extract any further information as to the cause of the
problem, I will notify the group, but after months of this issue occurring
on and off with only once a date testing facilities, it does seem far easier
to merely rewrite the code that is causing the problem.
Mark.