Large-scale Automated Unix Sysadmin
Imagining a framework for managing and automating system admin of unix machines on a large scale.. That is, enterprise-wide, possibly thousands of machines, across several differing contexts and audiences. What are our design requirements? What do we need it to do?
The system of framework for this large-scale systems administration:
- Must be modular
- Must allow for compartmentalization
- Must allow for differing needs of the various groups covered, in terms of their:
- Differing security profiles
- Differing risks
- Differing cost/benefit postures
- Must provide for sharing of universal techniques where they exist
- ie, where all groups agree on "foo", they can all use the same module (piece of code) to do foo, rather than requiring each to implement it separately
- ie each group can subscribe to a module that is managed/authored by the central source
- Must allow for differing techniques where they are needed
- ie, where not all groups agree on "bar", each group can have its own module to do (or not do) bar, rather than requiring all to do it
- ie use a module that is managed/authored locally
- Must support using of others' techniques where desired
- ie, where some but not all groups agree on "bat", they can share the use of a module, rather than requiring each to maintain their own local module
- ie subscribe to a module that managed/authored by another (not the central source)
- Should support shared maintenance of common (but not universal) techniques
- ie, where some but not all agree on "baz", they can share the use *and maintenance* of (write-access to) a module, rather than requiring only one to maintain it
- ie subscribe to a piece of code that's managed/authored by multiple authors together
- Should support inter-module dependencies and conflicts
- ie, a module could "require" another module, or "conflict" with another module, or "provide" a pseudo module or equivalent functionality of another module to satisfy a dependency
- Should allow for modules that:
- make a change once, ie for a baseline/starting point, ie kickstart
- maintain a change, ie assumes that this change will always be there
- enforce a change, ie makes sure that the change is always there
Other points:
- It must use strong cryptographic authentication, for example:
- ssh-rsa keys
- gpg signatures
- ssl keys/certs
- kerberos keytabs
- for:
- distributed machines authenticating central servers when requesting files/etc (eg, webserver ssl certs, kerberos keytabs)
- central servers authenticating distributed machines that request config files (eg, ssl client keys, kerberos kinit, ssh-rsa keys)
- distributed machines authenticating scripts or pkgs made available by central servers (eg, gpg signatures)
- central servers pushing files, scripts, or pkgs to distributed machines (eg, ssh-rsa keys, kerberos kinit)
Specific SAS needs:
- Must allow organizing around a whole-picture description of the desired configuration/result ("this is what it's going to look like when it's done"), rather than simply relying on an accumulation of chronological scripts ("do this, do that, wait nevermind undo that first step and do this instead").
- Must be deterministic about the order of actions taken and the results.
Possible implementations:
- rpms
- provides the inter-module dependencies (requires, provides, conflicts)
- provides gpg-signed authentication of individual modules
- supports multiple (simultaneous) repositories, thus ability to "subscribe" to modules from multiple different sources
- supports automatic downloading and installing (via up2date and yum); "nightly updates" could be done via cron, either for specific modules, or all modules, or a single "overview" module that merely requires all other necessary modules
- documented, standard format, usable and buildable by everyone
- native under redhat, supportable for solaris as well with a bit of effort

