Overnight shutdowns
Advice from Sciss:
- you should try to get a clean shutdown, even with all todays fancy things like HFS+ journaling etc. you will likely end up with problems after a while.
- for the only installation i did for a museum, we had a power-"bus"-system that was programmed. a kind of basic / assembler program that can switch off power but also trigger a relais. we are reading a certain usb-keyboard key which is send five or so minutes before the actual power is taken away. this way the computers can be safely shut down.
Advice from David
- Certainly make ensure shutdown is orderly. Decades of unix on all sorts of boxes tells me, "whenever possible, don't just pull the plug!"
Checking for server crashes and dealing with them
These won't happen often, of course :) but installations typically need a lot of uptime when you add it all up, so you need to be prepared.
One way to watch for a server crash (recommended by Sciss):
Updater( Server.default, { arg server, what ... args; if( what === \serverRunning and: { server.serverRunning.not }, { "Yukk, gotta reboot!".debug })});
...and modified by felix so it doesn't panic too quickly (the server may occasionally fail to respond quickly, e.g. if loading lots of buffers)...
Updater(server,{ arg s, message; if(message == \serverRunning and: {s.serverRunning.not},{ AppClock.sched(5.0,{ // don't panic too quickly if(s.serverRunning.not,{ "Server dead ?".inform; //this.cmdPeriod; // stop everything, she's dead }) }) }); })
Remove the updater before you quit the server when the installation is shut down (i assume it's shut down once a day? i would recommend this!>). i recommend setting Server -> aliveThreadPeriod to something like 2 seconds so in case the server gets really busy by accident, it doesn't cause your code to try to boot and reinitialize everything.
Here's another way (by Joshua):
a = { my 40-50 lines of code; }; a.value; // run the piece Routine.run({ loop({ s.serverRunning.not.if({ s.boot; s.doWhenBooted({ ("Rebooted at " ++ Date.getDate).postln; a.value; }) }, { "still running".postln; // don't actually include this... just handy to make sure the Routine works }); 10.wait; // do this every 10 seconds }) })
Logging
(Recommended by Sciss):
- store a log file of the starting and stopping times of your installation along with the post window contents every day, so you could come back after a week and see if all is fine, or if someone calls you up "hello, we don't hear anything anymore, something's broken", you can check those logs.
here's my code for that:
// note : must be called on the AppClock thread *storePostWin { var d, f; d = Document.listener; try { f = File( dataFolder ++ "log" ++ Date.getDate.stamp ++ ".txt", "w" ); f.write( d.text ); f.close; } { arg error; error.postln; }; }
and the shutdown method:
shutDown { arg powerDown = true; "Cleaning up...".inform; ... (your clean up code here) "Quitting server...".inform; server.quit; "Storing post window log...".inform; this.class.storePostWin; if( powerDown, { "Shutting down computer...".inform; unixCmd( "osascript -e 'tell application \"Finder\" to shut down'" ); }); }
If your machine's on internet, you can send your logs per email ;-)) i have never done this though
Configuring things for stability
Sciss said::
- i recommend setting Server.default.options.blockAllocClass_( ContiguousBlockAllocator ), it's more robust against indices-fragmentation, so if you're doing a lot of dynamic buffer management i think should use this class.
David said:
- There's a fair bit of 'the black arts' in this stuff so what seems logical and what works can be two different things. I always stagger the unit power-up so not more that one machine is spiking the mains at exactly the same time. Another thing that can affect power-up/down state is screensaver/power adaptor/sleep options. For testing turn all the "saving" options off.
Tommi said::
- Sometime ago I did the (most of the) technical design and the implementation of a sound installation that's supposed to run with zero maintenance as long as possible. The actual installation was realized with a doctored SC running on linux (Ubuntu server was the OS of choice) server hardware. The installation gets started after the boot and is restarted once a week (with crond). The sclang is checking that scsynth keeps running and if needed, restarts it (zombification of course cannot be detected this way). The disk usage is minimized, most crond'ed maintenance operations (updatedb's etc.), unnecessary processes were turned off etc., and the installation itself uses only stuff read into the memory (and we made sure everything fits well into the physical memory). Then there's a script as a watchdog for sclang. For the case of a hardware failure there are two identical computers in reserve, also a complete reinstallation/cloning (assuming the original hardware, e.g. the original computer with a new hard disk) takes about 15 minutes and is really simple to do. For the worst case scenario there's also the source code and build instructions for everything minus the OS. The installation has been running without any hicups (no need for manual reboot etc.) since its inception in last July, and would dare to say that the "IT" side of the project has proven to be solid. Waiting with curiosity which component is the first to break.
Testing
(Recommended by Sciss):
- run your installation for a whole day in your place before you move it to the site. often errors occur only after running your code for a long time, because improbabilities integrate over time and numbers exceed boundaries (as an example : when you are using time measurement in integer sample frames, after around 12 hours you'll run into the limits of 32bit integers (at 44.1 kHz, or 6 hours at 96 kHz) and if you do arithmetics with them you can end up in calculation errors).
Autoboot Linux SC
See here: Autoboot into emacs -sclang