Logging for Security, Privacy, and breach cost limitation
Developers are awesome at adding logging to help them debug problems.
In my experience they're often less amazing at logging for security,
privacy and/or audit-ability, especially if they haven't worked in an
environment where these matter. The lack of audit and/or
privacy logs can affect customer satisfaction with your product,
especially in a B2B context. I recently stumbled across research
that indicated that companies with user-managed controls related
to data privacy can expect an average of 25% increase in customer
"intent to purchase" numbers
1 - so this is potentially even a
money-maker if done properly, and certainly can enable more business
in many industries.
Logging for audit-ability, security, and sometimes privacy are
different animals, though there is generally some overlap. By
'audit-ability' I mean producing an audit log of major events that
your customer can view, or be sent automatically. Nearly all major
SaaS tools supply this. Examples here would include MicroSoft
Office365, SalesForce, GitHub, Okta... All of them have something
they call an audit log that can be accessed by their customers that
show things like logins, logouts, configuration changes, failed
login attempts, file uploads and downloads etc. These logs can be
useful for the customer in helping to determine if an account has
been compromised for example, or if information is being downloaded
in bulk - which sometimes is a pre-cursor to staff leaving, and
generally is always frowned upon by your legal department.
Logging for security is logging that is intended to alert your staff
to events (common and uncommon) that might be indications of your
systems becoming compromised. This would include such things as
endpoint access logs, logging from any IaaS that your company uses
to host websites/APIs, OS-level logging (such as the unix audit
log), logging from your website, and underlying micro-services.
These logs are generally fed into a security incident and event
management system (SIEM) for processing, and then alerting to your
security/ops staff as appropriate. In my experience, all SIEM
systems require initial tuning, and on-going tuning as your systems
change over time. Some SIEMs have additional features, such as
User and Endpoint Behavioral Analysis (UEBA). Some more primitive
SIEM systems only look at a single log entry to determine if an
alert should be generated. Others are capable of looking back in
time to see if there are other pre-cursors to the logged event that
would suggest it is a 'normal, everyday' sort of event, or a potential
compromise. I suggest that a client choose a SIEM that is history
aware.
Privacy logging can be related to logs required by regulation. For
example, the GDPR requires certain notices be sent to 'data subjects'
(typically end-users) of systems, especially when personal information
is altered or deleted. Privacy logging can also be a system feature
designed to improve your user's (or potential users') comfort with
using your systems. The study I mentioned above called out 4 areas
of special interest: transparency, data retention (the most important
area for the most people surveyed), data sharing practices, and
data minimization. The relative ranking of these vary by industry
though, and the results may need to be validated for your product
or market.
I would suggest that Product Management and Engineering/Development
Management get together and define what logs, of all types, your
system needs to be successful and/or capture more business. These
should then be expressed as requirements for all system changes,
and be routinely covered as part of the design and code review
processes. Note that it can be very difficult to gather information
on your competitors in this particular area.
If you keep a log of all the calls into your software, your development
staff may point that out and suggest "but *everything* is in there",
and they may be right. But there are 2 factors that would drive
you to a separate audit log. One, the log of all the API calls is
not easy to parse when the customer is looking for 'configuration/audit'
type calls. Second, the retention for these huge logs is often
shorter than what you'll likely need for an audit log. It's generally
much easier to have a separate audit log. You can also set the
retention time on the audit log events to a much longer timeframe.
These events you might want to even be 'permanent'. It's unlikely
you'll want the expense of holding on to every single app call or
micro-service log for years to enable this sort of log extraction
process.
Alerting
Security alerting is heavily dependent on the systems in question generating
useful log events that can be indicative of security issues.
In complex systems, such as you might see with SaaS, PaaS and other multi-
tenant environments, the logs are sent to a central log aggregation system.
The log aggregation system reviews the logs as they come in, and generates
alerts when it sees something amiss.
These alerts are sent to the company's security, or sometimes ops staff for
analysis.
Systems can create different levels of alerts, which are intended to get
more or less attention and/or immediate action from your staff.
So the flow looks something like this, for all systems that I'm aware of:
log messages from all sources (IaaS, OSs, micro-services, Web-servers, IDS/IPS) => log aggregation/SIEM -> alerts sent to staff
Often these systems are referred to as Security Event and Incident Management (SIEM) systems and there are many on the market.
Some of these systems are very sophisticated and can identify and flag unusual
behavior patters, such as a user who normally logs in from Texas showing up
on a connection from Russia. Or it may also be able to alert when one system
connects to another that it hasn't previously.
These systems are sometimes referred to as providing user and endpoint
behavioral analysis, or 'UEBA'.
Note that you can also send the audit records from SaaS systems your company uses to most log aggregation/SIEM systems too.
Sample Audit Events
The following is a brief sample of events you might want to retain in an audit log:
-
Account Creation/Deletion - the date, time, and user ID of the entity
(person or program) who created/deleted each account
-
Login and Logouts - the date, time, IP address, and account
username/ID and whether the login was successful
-
Account lockouts - the date, time, IP address, and account username/ID
that was locked out due to too many unsuccessful login attempts
-
Attempted use of elevated/admin privileges, and whether not not the
use was permitted
-
Changes to account privileges - date, time of change, description and IDs
of the impacted entity, and the entity making the change
-
Changes to user Personal Information such as password resets,
email address changes and the like
-
Changes to password configuration settings such as password expiration,
password re-use, etc. limitation
-
etc.
Incident Response
All of these types of logs, and especially any alerts triggered by a SIEM are
normally fed into a company's Security Incident Response activity. And these
are typically the most important of the indicators that the Response process
ingests.
In some cases these alerts will be sent to the same staff who are monitoring
other systems for uptime, utilization, database space available and the like.
Footnotes