The security operations center (SOC) is the nucleus of an enterprise’s cybersecurity program. To implement an effective SOC, it is crucial to understand what is an SOC. An analogy may be helpful: All airports have a security team whose job is to identify potential threats and prevent dangerous situations that may arise. Airport security teams are the first line of defense, and they are skilled at choosing those few individuals out of the thousands of people moving through the airport who may pose threats to national security due to their entering the country illegally or being involved in dangerous activities such as drug, human or animal trafficking. This is similar to what an SOC is expected to do, which is why the SOC is commonly referred to as the first line of defense. It plays a crucial role in the early detection of security threats within the environment. There are several essential elements that organizations can leverage to elevate their SOC to new heights.
Essential Tools in the SOC Tool Kit
The SOC relies on different security tools to effectively monitor, detect, analyze and respond to security incidents and threats, including endpoint detection and response (EDR), network detection and response (NDR), security information and event management (SIEM), intrusion prevention systems (IPSs), security orchestration, automation and response (SOAR), user entity behavior analytics (UEBA) and threat intelligence platforms (figure 1). Among these tools, SIEM stands as the cornerstone in the SOC analyst’s arsenal, playing a pivotal role in around-the-clock monitoring operations. It is also often the starting point when it comes to early detection of security threats within the environment. Therefore, the success of the SOC greatly depends on the implementation of SIEM and the way it is configured and utilized.
Best Practices for an Effective SOC
Ensuring a strong security posture requires the implementation of effective tools and practices within an SOC. The best practices and approaches to empower the SOC to achieve optimal performance and effectiveness include:
- Consider only security-relevant logs—Log monitoring is not equivalent to SOC monitoring, so only two types of log events should be incorporated into SIEM: security events used to build detection rules, and security events that add context to detected events. For example, firewall logs (e.g., threat, malware, Uniform Resource Locator [URL] filtering, intrusion prevention system [IPS] logs), web application firewall (WAF) logs, proxy logs and Windows operating system security logs (especially those detailing login success or failure, audit logs cleared and processes created) are relevant security logs used to create detection rules. Events that do not add value to the security monitoring process include those recorded in performance logs, availability logs, health logs, device failure logs and error logs. Incorporating these into SIEM only overloads the SOC as such events increase noise and false positive alerts and lead to higher SIEM costs.
- Understand the network architecture—Understanding the network architecture or how inbound and outbound traffic flows within the network is critical when analyzing an alert. Because SOC teams are often unaware of or have very little knowledge about network architecture, it takes them longer than necessary to act on alerts that are triggered. These delays impact key performance indicators (KPIs) such as mean time to detect (MTTD) and mean time to respond (MTTR).
- Leverage security logs for enhanced insight—Every device, service, data source or cloud platform that has audit-logging capability should record security events and forward them to the SIEM tool. For example, email, virtual private networks (VPNs), virtual desktop infrastructure (VDI) accessible over the Internet, single sign-on (SSO), multifactor authentication (MFA) and remote desktop tools used to provide remote IT support are some of the entry points for attackers. These elements have been targeted and compromised in various attacks. Therefore, incorporating security events from such critical infrastructure into the SIEM and building relevant detection capabilities are important.
- Customize detection rules—Detection rules are the main determinants of an SOC’s effectiveness, so it is important to define rules that are relevant to the specific enterprise. For example, if an enterprise typically does not operate 24/7 and does not have offices outside of its home country, its system can be designed to detect a spike in login activities on the weekend or login attempts from Internet Protocol (IP) addresses located outside the country. These events can include login events from data sources such as email, VPN, SSO and Azure Active Directory (AAD).
- Use the MITRE ATT&CK framework—This is a comprehensive framework that provides in-depth understanding on tactics, techniques and subtechniques used by adversaries in real-time cyberattacks. It includes 41 data sources that defenders can use to log information into their SIEM and build detection logic.1 Mapping the enterprise’s alert logic to the MITRE ATT&CK framework adds value to the entire detection engineering process. Alerts triggered by these detections could be early signals of full-fledged attacks, so it is imperative to align detection methods with tactics, techniques and subtechniques, as described in the MITRE ATT&CK framework.
- Respond in a timely manner—The SOC needs to be on high alert and show a sense of urgency when reacting to potential threats. SOC teams should leverage ticketing tools and collaboration platforms to effectively communicate alerts and required actions to the appropriate stakeholders. If there are repeated actionable alerts, the SOC team should implement a systemic and collective fix rather than responding to each alert with the same action. For example, in the case of alerts related to potentially unwanted programs (PUPs), it is important to analyze trends for the last one or two months and find the root cause, which could be:
- Full Universal Serial Bus (USB) storage access is allowed.
- Proxy rules are ineffective or not enabled.
- Local administrative access is provided to normal user accounts.
- Misconfiguration of security controls
- Security design flaws
- Unplanned changes that are not recorded and not approved
- Human oversight, resulting in gaps between what is documented and what is implemented
- Too many policy exceptions and listing allowances for different users within the enterprise, which are often more of a convenience than an actual business requirement
- SIEM detection rules that are not optimized
The Role of Threat Intelligence in an SOC
Threat intelligence is a vital part of the entire SOC process because it helps provide external visibility and context. It includes:
- Operational threat intelligence such as indicators of compromise
- Tactics, techniques and procedures (TTPs) of different attack groups targeting specific countries or industries
- Dark web monitoring of compromised credentials, squatted domains, or look-alike domains that are typically used for phishing attacks
- Information about a vulnerability being exploited in the wild
It is essential to take this external context into consideration. The value lies in using threat intelligence in an SOC framework that consists of not only detection-based alerts, but also situational awareness. For example, during the holiday season, attacks targeting consumer industries that typically hold holiday sale events should be expected and prepared for accordingly.
Automating the SOC
"Automation" has become a popular buzzword, but before automating an SOC, it is important to understand the foundation on which an SOC is built: people, processes and technology. Assuming an enterprise has the right SIEM technology and processes, it can run without automation, but it cannot run without people. An SOC needs to reach a certain level of maturity before introducing automation such as machine learning (ML) or artificial intelligence (AI). These elements should complement the SOC, not replace the people, who should be retrained and repurposed to do intellectually challenging tasks.
Assuming an enterprise has the right SIEM technology and processes, it can run without automation, but it cannot run without people.
Once appropriate maturity has been reached, if an enterprise wants to introduce automation to an SOC, then the enterprise needs to clearly define its objectives. For example, an objective could be a 50 percent reduction in the manual efforts of level 1 analysts by automating their repetitive tasks, such as:
- Enriching alerts with threat intelligence feeds
- Checking the reputations of IPs, domains and URLs
- Tracking trends for each alert category: recurrence, root cause and repeated user violations
- Performing follow-ups and escalations
Although an autonomous SOC has benefits, the components of an autonomous SOC may also introduce risk into the environment. For example:
- If the automation workflow goes wrong, it could revoke the access of a valid, critical user.
- If the test data used to train the ML model are modified in an unauthorized way, the ML model will not be trained correctly.
The Role of SOC in Audit and Compliance
In addition to the essential functions of threat monitoring and detection, threat hunting and incident analysis, the SOC plays a crucial role in the audit and compliance of an organization, such as with:
- Compliance monitoring
- Audit and assessments
- Cyberincident investigations
- Cyberincident reporting to regulatory bodies
For these purposes, retention of security logs and alerts is a critical activity that an SOC needs to plan and implement as part of the SIEM deployment.
Why Security Logs Should Be Retained
Security logs help identify threats early in the attack phase by triggering detection rules. However, if a security incident occurs and an incident response plan and crisis communications have been invoked, historical security logs are needed to answer questions such as:
- What happened?
- Why did the incident take place?
- When was the incident identified?
- How long were the associated activities present in the environment?
- What are the impacted systems and user accounts?
Having historical logs can speed the incident response and forensics process and help identify the root cause.
In addition, cyberinsurance requires enterprises to retain security logs to make a claim in the event of a breach. To determine the scope of a data breach, cyberinsurance organizations may engage experts in the field of incident response and digital forensics. A lack of logs usually delays this determination and can have a negative impact on the claimed amount and the overall claims process.
If a security incident occurs and an incident response plan and crisis communications have been invoked, historical security logs are needed to answer questions.
How Long Do Security Logs Need to Be Retained?
Typically, security logs are retained for a minimum of 180 days, or six months. However, depending on the nature of the business, the geographic regions in which the enterprise operates, and the applicable standards and regulations, retention periods may vary. For example:
- The Payment Card Industry Data Security Standard (PCI DSS) requires security logs to be retained for 12 months, with three months of log data available for immediate analysis.2
- Directives issued by the Indian Computer Emergency Response Team (CERT-IN) require logs to be retained for 180 days.3
Other country-specific regulations also prescribe the number of days for which the security logs must be retained. It is advisable to have an organizational policy indicating how long security logs need to be retained, and this policy should align with the regulatory requirements of the country within which the organization operates.
Which Logs Should Be Retained?From an SOC perspective, at a minimum, security logs that contribute directly to the detection rules must be retained for a longer duration. These include:
- Access and authentication logs, such as application, VPN, domain controller, proxy, SSO and email logs
- Server logs, such as Windows security event, authentication on Linux servers, and Internet information services (IIS) web server logs
- Network logs, such as firewall and intrusion prevention and detection system logs
- Cloud platform logs, such as logs that provide information on bulk virtual machine (VM) creation or deletion, storage deletion, and changes to tenant administration or tenant policies
Where Should Security Logs Be Kept?
Considerations for log retention include:
- On-premises SIEM—If the organization uses an on-premises SIEM tool, logs can be stored on-premises using network-attached storage (NAS) systems or network storage servers. While this approach may be more cost-effective than cloud-based storage, it may lack speed and scalability.
- Software-as-a-Service (SaaS)-based SIEM—For organizations using a SaaS-based SIEM, it is advisable to retain logs in the cloud service provider’s data lake solutions. This approach can help reduce costs associated with transferring data out of the service provider’s cloud. Some leading SaaS-based SIEM providers have introduced cost-effective solutions through tiering options such as pay-as-you-go and per day consumption.
Organizations should select a log retention solution that they find both cost-effective and operationally manageable.
Conclusion
For a long time, SOCs primarily focused on traditional enterprise infrastructure, such as domain controllers, firewalls, servers and endpoints. However, as the adoption of cloud platforms, the Internet of Things (IoT), blockchain, AI and ML continues to rise, the boundaries between these new technologies and the conventional enterprise infrastructure are becoming less distinct. This situation presents new challenges for SOC teams. In response, SOC analysts must enhance their skills in these emerging technologies, integrate them into the SOC framework and establish specific detection methods to effectively identify and counter threats to this diversified infrastructure.
Endnotes
1 MITRE ATT&CK, http://attack.mitre.org/
2 Payment Card Industry (PCI) Data Security Standard (DSS), PCI DSS Requirements and Testing Procedures, Version 4.0, March 2022, http://docs-prv.pcisecuritystandards.org/PCI%20DSS/Standard/PCI-DSS-v4_0.pdf
3 Government of India Ministry of Electronics and Information Technology Indian Computer Emergency Response Team (CERT-In), No. 20(3)/2022-CERT-In, India, April 2022, http://www.cert-in.org.in/PDF/CERT-In_Directions_70B_28.04.2022.pdf
SHWETA KSHIRSAGAR | CISA, CISSP
Is an information security professional with 18 years of industry experience in various domains of cybersecurity, including cyberincident response, data protection and privacy, information security audit, and compliance. She was recently awarded a DynamicCISO Excellence Award for her project on security operations center modernization. She can be reached at http://www.linkedin.com/in/shwetaksagar/.