UNDER CONSTRUCTION - needs splitting into "GOAL -> SOLUTION -> IMPLEMENTATION STEPS/MATURITY"
1. Best Practices Audit Checklist
All
areas are described in terms of a progression or timline from chaos and
ad hoc work in a smaller organisation to mature and best practice work
methods in a larger or better organised workplace:
goal (uptime/uninterrupted service):want to keep services available at all times
disk
redundancy -> offline (tape) backups -> online (disk) backups
-> failover box/site (manual) -> failover box/site (automatic)
-> load balanced cluster -> Failover Clusters -> Auto pickup
by failover site -> Business continuity planning
goal - performance and optimizationDoing the best with what you have
configuration of app -> filesystem -> network -> load balancing
goal - progress and improvementYou don't have to stand still and firefight, you want ot take on projects to move infrastructure forward
dedicated
helpdesk -> project/callout days split -> review of helpdesk
tickets for project selection -> auditing for improvement
goal - backupssingle
backup -> offsite storage -> periodic backups -> backup
verification -> backup monitoring and alerts -> multiple backup
types for different scenarios
goal - troubleshooting
knowledgebase of past problems -> access to user systems -> access to developers or vendors -> training
goal - authenticationpasswords
-> procedures for password change for leavers etc -> secure
passwords -> single sign-on -> security audits and reviews ->
goal consistencybuild sheet -> scripted -> Configuration Management software
goal - time synchronisationManually
sync time -> Use network time -> in-house network time server
-> multiple servers -> multiple monitored servers with alerting
goal - system documentationsystems
logs -> systems central logging -> full systems audit list ->
systems diagrams and topology maps -> systems documentation library
(disks used etc so known for parts, IP address list etc, server
history, problems etc)
goal - process documentationcommon procedure howtos -> scripts for common tasks
goal - change controlpolicy
on types of change allowed -> log of changes -> established
sign-off procedures for common tasks -> scripts/method for
non-standard changes -> rollback plans -> request process for
non-standard changes -> proper version control -> authenticated
version control with role-based systematic approval process
goal - system securitySecurity, Updates and Patching
Directory, Lookup and Authentication
Audits?
Log, central logs, alerting? - NIDS/HIDS and IPS
firewalls, minimal access set
goal - system managementSome
values live in configs and infrastructure can change but servers don't
get updated. Examples are DNS nameservers or NTP servers, set at
machine build time and never changed thereafter. Need a system for
managing these.
Manual log of what is set where -> central synched copy of configs
-> central maintenance push/pull from central configs ->
config-based push/mount to machines
goal - remote managementon-network
in-band access -> vpn inband access -> on-network oob (ILOM) to
hosts-> on-network oob access to routers and switches -> fallback
oob access from off-network
goal - monitoring and alertingsystem
monitoring (ie host level) -> alerts when server unavailable (ping
script) -> service monitoring (ie major application level) ->
tiered service monitoring -> tiered alerting -> off-network
paging -> job monitoring (ie task level) -> documentation-driven
monitoring configuration (so don't have to maintain monitoring system
as separate config) -> distributed monitoring
goal - reportinglogs
-> central logging host -> reporting and monitoring of logs ->
uptime and availablility from monitoring system -> non-error
reporting (performance/capacity mgmt)