Logging & Alerting

Logging for Security, Privacy, and breach cost limitation

Developers are awesome at adding logging to help them debug problems. In my experience they're often less amazing at logging for security, privacy and/or audit-ability, especially if they haven't worked in an environment where these matter. The lack of audit and/or privacy logs can affect customer satisfaction with your product, especially in a B2B context. I recently stumbled across research that indicated that companies with user-managed controls related to data privacy can expect an average of 25% increase in customer "intent to purchase" numbers ¹ - so this is potentially even a money-maker if done properly, and certainly can enable more business in many industries.

Logging for audit-ability, security, and sometimes privacy are different animals, though there is generally some overlap. By 'audit-ability' I mean producing an audit log of major events that your customer can view, or be sent automatically. Nearly all major SaaS tools supply this. Examples here would include MicroSoft Office365, SalesForce, GitHub, Okta... All of them have something they call an audit log that can be accessed by their customers that show things like logins, logouts, configuration changes, failed login attempts, file uploads and downloads etc. These logs can be useful for the customer in helping to determine if an account has been compromised for example, or if information is being downloaded in bulk - which sometimes is a pre-cursor to staff leaving, and generally is always frowned upon by your legal department.

Logging for security is logging that is intended to alert your staff to events (common and uncommon) that might be indications of your systems becoming compromised. This would include such things as endpoint access logs, logging from any IaaS that your company uses to host websites/APIs, OS-level logging (such as the unix audit log), logging from your website, and underlying micro-services. These logs are generally fed into a security incident and event management system (SIEM) for processing, and then alerting to your security/ops staff as appropriate. In my experience, all SIEM systems require initial tuning, and on-going tuning as your systems change over time. Some SIEMs have additional features, such as User and Endpoint Behavioral Analysis (UEBA). Some more primitive SIEM systems only look at a single log entry to determine if an alert should be generated. Others are capable of looking back in time to see if there are other pre-cursors to the logged event that would suggest it is a 'normal, everyday' sort of event, or a potential compromise. I suggest that a client choose a SIEM that is history aware.

Privacy logging can be related to logs required by regulation. For example, the GDPR requires certain notices be sent to 'data subjects' (typically end-users) of systems, especially when personal information is altered or deleted. Privacy logging can also be a system feature designed to improve your user's (or potential users') comfort with using your systems. The study I mentioned above called out 4 areas of special interest: transparency, data retention (the most important area for the most people surveyed), data sharing practices, and data minimization. The relative ranking of these vary by industry though, and the results may need to be validated for your product or market.

I would suggest that Product Management and Engineering/Development Management get together and define what logs, of all types, your system needs to be successful and/or capture more business. These should then be expressed as requirements for all system changes, and be routinely covered as part of the design and code review processes. Note that it can be very difficult to gather information on your competitors in this particular area.

If you keep a log of all the calls into your software, your development staff may point that out and suggest "but *everything* is in there", and they may be right. But there are 2 factors that would drive you to a separate audit log. One, the log of all the API calls is not easy to parse when the customer is looking for 'configuration/audit' type calls. Second, the retention for these huge logs is often shorter than what you'll likely need for an audit log. It's generally much easier to have a separate audit log. You can also set the retention time on the audit log events to a much longer timeframe. These events you might want to even be 'permanent'. It's unlikely you'll want the expense of holding on to every single app call or micro-service log for years to enable this sort of log extraction process.

Alerting

Security alerting is heavily dependent on the systems in question generating useful log events that can be indicative of security issues. In complex systems, such as you might see with SaaS, PaaS and other multi- tenant environments, the logs are sent to a central log aggregation system. The log aggregation system reviews the logs as they come in, and generates alerts when it sees something amiss. These alerts are sent to the company's security, or sometimes ops staff for analysis. Systems can create different levels of alerts, which are intended to get more or less attention and/or immediate action from your staff.

So the flow looks something like this, for all systems that I'm aware of:

log messages from all sources (IaaS, OSs, micro-services, Web-servers, IDS/IPS) => log aggregation/SIEM -> alerts sent to staff

Often these systems are referred to as Security Event and Incident Management (SIEM) systems and there are many on the market. Some of these systems are very sophisticated and can identify and flag unusual behavior patters, such as a user who normally logs in from Texas showing up on a connection from Russia. Or it may also be able to alert when one system connects to another that it hasn't previously. These systems are sometimes referred to as providing user and endpoint behavioral analysis, or 'UEBA'. Note that you can also send the audit records from SaaS systems your company uses to most log aggregation/SIEM systems too.

Sample Audit Events

The following is a brief sample of events you might want to retain in an audit log:

Account Creation/Deletion - the date, time, and user ID of the entity (person or program) who created/deleted each account
Login and Logouts - the date, time, IP address, and account username/ID and whether the login was successful
Account lockouts - the date, time, IP address, and account username/ID that was locked out due to too many unsuccessful login attempts
Attempted use of elevated/admin privileges, and whether not not the use was permitted
Changes to account privileges - date, time of change, description and IDs of the impacted entity, and the entity making the change
Changes to user Personal Information such as password resets, email address changes and the like
Changes to password configuration settings such as password expiration, password re-use, etc. limitation
etc.

Incident Response

All of these types of logs, and especially any alerts triggered by a SIEM are normally fed into a company's Security Incident Response activity. And these are typically the most important of the indicators that the Response process ingests. In some cases these alerts will be sent to the same staff who are monitoring other systems for uptime, utilization, database space available and the like.

Footnotes

1. Consumer Perspectives on Data Privacy and Implications for Business Growth

By Craig Payne