Average CLI Script Run Time with grep, sed and awk
Our order processing system at voicearchive has several background tasks that need to run often. For this we have a very nice cli script, written in php, from which these tasks are called. The cli runs every two minutes - first checking if the last call has finished, if not, it exists early. To make sure that this script keeps running, we have made several measures. The way we know if the cli script is already running, is by a pid-file, whose age we monitor every minute - and if it is above a certain threshhold for more than a few samples, we call some error correction actions, and delete the pid file.
But since this error correction and pid deletion is potentially harmful - for instance it could stop an action from running, but mark is already run, or it could run the same action on the same object twice, both of which could have potential harmful consequences for our customers and suppliers. we have decided to also monitor the health of the cli script in another way.
The script creates a log file and every (successful) run of the script logs a line with the CLI RUN TIME followed by the run time in seconds as a float. We decided to parse the average runtime of the last hours script runs from the log, and alert the appropriate people, if this value exceeds a certain threshold.
We use tail -630 because we have learned that this number of lines roughly equals to 30 runs which (when running normally) would give is the last hour.
The grep part filters the file, so that we only get lines containing "CLI RUN TIME".
The sed part filters out all parts that are not a valid float value leaving only the seconds part.
The awk part - which I pretty much just stole from here - sums all the values, and prints out the average value (the sum divided by the number of samples).
