DCS Security: Security Layers

Showing posts with label Security Layers. Show all posts

14 February, 2008

Year of SoD

From a security perspective

If '00 and '01 were years of the worm;

'02 through '04 the years of SoX, Compliance, and executive oversight

'05 through '07 the years of organized crime and Identity theft

Then

In the Security Realm these will be the years of Segregation of Duty.

Why?

7 Billion Dollars
Wall Street Journal
http://online.wsj.com/article/SB120168827173528415.html?mod=googlenews_wsj
CNN
http://www.cnn.com/2008/WORLD/europe/01/30/french.bank.ap/
Reuters
http://www.reuters.com/article/businessNews/idUSWEB304120080124
Bloomberg
http://www.bloomberg.com/apps/news?pid=20601085&sid=aSy8ZDtkdcow&refer=europe

On the Sub-Prime Side
Guardian
http://www.guardian.co.uk/business/2008/jan/30/subprimecrisis.creditcrunch?gusrc=rss&feed=networkfront
Financial News
http://www.financialnews-us.com/?page=ushome&contentid=2449684760

Information Security has a unique role that it can play in protecting a company from these issues. That role is due to the convergence of information. The information security team is the only location that all of the data exists that can be used to properly control for these types of complex issues.

Addressing them requires the proper combination of ID management, Roles Based Controls, and Analytic Business intelligence. (the latter is the primary reason I championed the Analytic Environment standards over a year ago).

This is an area that Info Security can not only serve as a minimum barrier to prevent downtime or confidentiality loss but can also add legitimate value to the business in the form of information, reports and preventative controls to enable increased trust to the actual people performing the real day to day work without the risk of a massive failure.

On the opposite end SoD control failures are massive and systemic. Not only do they result in dramatic items like the ones mentioned above but also ubiquitous often unintentional losses. From system down time to improperly placed orders or paid claims the incremental small losses exist in every organization.

The real question now is can we position ourselves so that we are ready as these waves break?

02 April, 2007

Meshed defense

Pretty good Defense in Depth post at Matasano

My question now is can you give me one working example of a working "Mesh" security design? (not Crypto?)

More detail on it at setuid just for you

27 February, 2007

Transport Layer Security - Part 1

Part of the Security Layer Series

Layer 4 is where the rubber meets the road as far as actual connectivity to the applications and logic of the controllers.

Layer 4 is the transport layer and for IP it typically means either TCP (Transmission control protocol) or UDP (User Datagram protocol).

I mentioned earlier that IP is inherently not deterministic and that has implications for automated control. Layer four is the first place where the compensations for this occur.

A quick run through of how TCP works will help some. I am going to grossly oversimplify here so if someone wants to correct or provide more detail feel free.

TCP establishes a session to ensure data delivery. A host initiates the communication by sending a TCP/SYN packet. The recipient of the SYN responds with a SYN/ACK with session identification information and the original host responds with an ACK/ACK establishing the session. Periodically during the communication stream the acknowledge process is repeated to ensure the communication is maintained. Checksums are included as an inherent part of the protocol. Time sent between packets received is monitored to determine if a session is lost and to initiate reestablishment of the communication stream.

What this means in a nutshell is that TCP has many mechanisms built into it that compensate (in part) for the issues introduced by the fact that IP is non deterministic. It doesn’t by any stretch of the imagination mean that TCP itself is secure in any way. There are many ways to game the system and hackers and worms use them to their full advantage. If you really want to get into the details take a look at NMAP and the lists at www.Insecure.org .

The most common one and the one I have seen cause issues on PLC’s is the syn scan. It basically works by opening up a listening port then streaming syn’s to all of the selected ports on every address that is to be inspected. Everything that responds with a syn/ack is logged. The connection is never completed with an ackack. This is where the problem is (especially for controllers with older IP stacks). The receiving host uses some resources to sit there waiting for that ack/ack. There are DoS attacks related to this but for the most part they are not that effective for newer IP stacks. (Syn floods can still cause headaches though) Unfortunately PLC’s do not always have newer stacks so they are often particularly vulnerable to this.

Aside:

This is directly relevant to the scanning discussions that have occurred with some level of passion on this blog’s comments and in the background via email. My advice here if you plan on scanning a scada system for the first time and you have done the change management it is best to start with a TCP connect scan that exits gracefully as your initial connection enumeration method. Limit the scan to a few interesting ports and don’t hit all 65k (at first at least). I wouldn’t even do fast scan ports. After you have a few under the belt for that address range then slowly expand. Do the fast scan ports then if wanted the whole 65k. After you are comfortable with this make sure you have people watching the equipment and have a recovery plan then try the syn scans. Once you have gotten past this point you can go on to the rest of your vulnerability assessment or pen test. I know this is insanely conservative for most Security professionals but the critics are not exaggerating when they want that bad things can (and will) happen. I am an advocate for scanning systems and have done so many times without significant issue on Rockwell/ABB, Honeywell, Siemens, and other vendor control systems but there is always a risk. My typical response to the DON”T SCAN crowd is “Sooner or later the systems are going to be hit by an actual attack or something that is functionally identical to one so wouldn’t you rather that happen in a controlled manner?”.

End of Aside

Many PLC vendors use TCP as their primary IP communication method to their controllers and all of them use it for their historians, MES, and control aggregation systems. I have seen a bit of an explosion in HTTP access to endpoints and I have mentioned ModBusIP in earlier posts in this series. I am not going to go into detail on what ports are used here. If you want to find out ask your vendor they will tell you. What you should do however is make sure that is possible you block access to the TCP port used as the primary PLC communication protocol at the point closest to the controllers as possible. ACL’s are acceptable if actual firewalls are not available. For vendors that use standard ports such as telnet, http, or RPC this can be somewhat more difficult to do. Take advantage of point to point and point to multipoint (subnet) rules. The key here is to not allow access to the PLC’s from an uncontrolled network. Access to the Historians and central control systems should be controlled primarily on a white list basis. For really large engagements such as regional operation centers it is often possible to isolate both the central and the local subnets and connect them via VPN tunnels. If you are doing this it is best to isolate remote sites from each other.

Enough for today

Rest of TCP and UDP continued later.

08 February, 2007

Layered Security Control Series Aggregation Post

This is the overview summary of a series of posts mapping Information Security Controls to SCADA, DCS, and ACS environments. The primary approach of the control structure is to map the controls to a modified OSI model. This is imperfect but does provide a technical framework to serve as the seed of the structure. The last half of the layers (pretty much everything beyond the host layers) departs from this model.

While these posts have specific data relating to SCADA and other control system environments much of the information is applicable to any information security environment. Many of the concepts and much of the data in the posts is relatively basic and most useful for people who are just entering into the information security and SCADA security field but there should be enough good nuggets of data that even experienced professionals will find some value in reading them.

My intention is to convert each of the sections into extended PDF’s and Pamphlets that have additional data and details over the initial posts. I am not certain when this will be done.

Building controls in multiple layers provides very strong security even with imperfect individual controls.
From an earlier post on layered controls

So if you can’t get 100% with a single control how do you get 100% or close to it?

I’ll use worms as the example because it is easy not because I think they are the most likely current threat.

If you can stop 80% of the worms with your companies external firewall.

Then stop 80% of the remaining worms with segmentation to your PCN.

Then stop 80% with a NIPS device

Then stop 80% of the remaining with a Host based firewall

Then 80% with patching

Then 80% with HIPS

Then 80% with Memory Based Protection

Etc…

If you can get an 80% reduction with each layer then you have reached your .001% likelihood layer with 6 controls even if you had a 100% certainty of the threat event occurring to begin with.

So the trick is identifying the applicable controls, determining how they (and how much they) reduce the likelihood, and if they can be layered with outer controls.

By not relying on an individual control being perfect you reduce cost (because you have a greater choice of solutions), you reduce impact on the overall system design, and you increase flexibility for your designers and end users.

The post of the series in order are:

Physical Security Layer

Data Link Layer Security Part 1

Data Link Layer Security Part 2

Networking Layer Security Part 1

Networking Layer Security Part 2

Transport Layer Security Part 1

Host Security Control Layers (being planned)

Process Controls including standards and procedural structures (TBD)

Governance Controls including visibility and audit feedback mechanism (TBD)

Financial incentives (Budgeting and leveraging business unit decisions using money and risk) (TBD)

Memetic Controls (Training, Expectation setting and Marketing) (TBD)

By properly combining the controls in these layers it is possible to get a working flexible and highly secure Operating environment that is able to adjust to problems quickly with the least amount of cost.

29 January, 2007

Layer 3 – Networking - Continued

Continued From

ACL’s, Firewalls and the bottom capabilities of NIPS

If you have successfully divided your PCN subnet from the rest of you LAN’s you still have to have a way enforce that separation. Access Control Lists (ACL’s), Firewalls, and the bottom layer and capabilities of a NIPS provide a method of doing this. Note that I am not getting into ports yet. Next layer up.

At layer three they all function in a relatively similar manner and are close to being the same capability. Firewalls (and NIPS using firewalls) of any type are less likely to be susceptible to spoofing or man in the middle attacks from traffic that must traverse the PCN to the Business network but most routers and switches in the last few years have a pretty robust ACL capability. A firewall capable switch or router gives even more flexibility but isn’t always available. The real key here is how the networks are set up.

For smaller organizations a single division point and one network is all that is necessary.

In this environment you would have a PCN connected via a firewall to the business network. If the business network has access to the internet (which they all do) it is essential that that access is also protected by a firewall. This isn’t about protecting your business network so I will skip all of the details here but it is important to remember that if you have connections to your PCN then anything that compromises you business network also puts your PCN at increased risk. This means that a solid DMZ and extranet environment are important for the business network. I am writing all of the rest of this from the presumption that this is the case.

I have never seen an acceptable reason for a PLC to be directly accessible from the business networks so putting in a log any any, drop any any, (dump your logs to a syslog server) for PLC addresses should be the standard. If there is a need to directly access a PLC from a remote point (and there sometimes is) then use a VPN or some other secure authentication and communication method to facilitate the access. Terminate it on a separate subnet that has no direct external access and then route from there.

For larger companies and organizations there will be a need to provide multiple differentiated networks. Many organizations use a PCN DMZ (sometimes called a Process Information Network [PIN]) to house Historians and MES. By doing this you can granularly control access to actual control nodes while greatly simplifying secure access to data from the production nodes.

I have seen a lot of other distinctions

Utility Networks – used to house servers that pass patches, AV updates, software revisions and other utility software (be careful that it doesn’t just become the easy way around security)

ESD Network – Emergency Shutdown Network – Just like the name implies they house the systems used to shutdown in an emergency. Access is very tightly controlled often these systems are completely separated from others.

Critical Systems or Red Line Networks – For highly critical valves, pumps, breakers and gauges a critical systems network allows tight granular access and control of access for systems that may have safety or environmental significance or for systems that might have cascading failure modes.

Monitoring Network – A network where PLC’s or RTU’s are used only for monitoring functions and have no direct control capabilities. Because the risk of inadvertent operation is much lower a looser set of controls can be applied. You still must be careful that it isn’t used as a jumping point to other systems. You also have to be careful if it is used in an open loop control scenario where an operator is making control decisions based on the readings.

Legacy Network – used to separate legacy and unmanaged equipment from the rest. This is a very important network to consider. The fact of the matter is that for many automated control systems there will be hold over systems that have distinct security issues that might be better off separated from other systems.

Vendor Systems Separations – many vendors who have taken up the security hue and cry have started defining their systems within specific subneting requirements. In general this is a good thing because they can tightly control access and what traffic goes in and out based on their on hardware’s needs

Vendor PCN Extranet – An extranet subnet that houses servers to provide synchronization and control between divergent vendors OR (big OR not and) provide a controlled access drop off point for vendor access to systems for maintenance. I have seen both definitions used for the same term. If someone wants to come up with something better please do. I’ll float it and see if it catches on.

Partner PCN Extranet – Allows a controlled termination point for access either between operating partner networks or for external contractor controls either for troubleshooting or for actual operations.

Site PCN Extranet – Allows for the aggregation of information and data controls from multiple sites. It is distinguished from the PIN extranet in that actual control functions might be necessary such as on pipelines or long distance power transmission lines.

Site PIN Extranet – usually aids in the termination into a centralized control and operations center. Also provide a gathering point for production data into business systems in very large companies.

Whew...

There are actually a few more but I am stopping now. The key here is keep it as simple as possible. If adding one of the network subdivisions I mentioned above helps make control of access to those systems simpler and doesn’t make the overall design too complicated then use it. If, on the other hand, you only have a few dozen PLC’s and a single historian then the simplest solution is best. One firewall and at most two control networks, a PIN and a PCN should be fine.

Same catch phrases as always for firewall or ACL configuration. Least rights needed for effective operation. Default at the end of the chain is deny any any and above that is specific permits for the traffic that is absolutely needed. If they don’t demonstrate a defined need to get to an address don’t permit it.

If you are on a more complicated network then the business network should access the PIN and vice verse and the PCN should access the PIN and vice verse but it should be designed such that the PCN never needs to access the business network or vice verse.

ESD and Redline Networks should be locked tight except during controlled change windows.

24 January, 2007

Layer 3 – Networking - Security

There is really only one item on the networking layer that is significant from an ACS prospective but that item is a huge one.

IP is on controllers and control networks.

Of course IP is everywhere. Why wouldn’t it be?

It is so beautifully simple. Some of the best and most elegant engineering I have ever seen.

With 4 bytes of information (likely less than the amount of information required to encode two letters of your name) you can get from any computer in the world to any computer in the world and back again.

Oh, this is a bit over simplistic. There is certainly more information involved in the total train of the data movement but as far as your computer is concerned only 4 bytes matter. How simple can you get? The fractal complexity that grows from this seed is amazing.

Enough cheesiness.

The consequences of this are what make all of the other security concerns significant. If a PLC or MES is connected to an IP network (even indirectly) then anyone in the world that knows how can access them (though not necessarily easily). With controllers and MES’s the way they are currently designed that means that potentially anyone in the world can operate them. That means that anyone in the world can potentially operate the equipment they are connected to.

Everything else flows from this.

So what are the control mechanisms for layer 3?

VLAN’s
Subnetting and Subnet design
Routing
ACL’s
Firewalls
NIDS
NIPS

VLAN’s

For the most part a VLAN’s purpose in layer 2 is to logically divide and possibly isolate separate information conduits. The significance in layer three is that it is very easy to route around a VLAN as a divider. This can be done in several ways. The most common is simply using a router but dual homed systems and multi homed systems are also a threat. Basically what this means is that the control aspects gained using VLAN’s on layer 2 are useless if there is open routing of any type between the VLAN’s. Many times I have been told “oh don’t worry it is on its own VLAN”. The engineer thinks that somehow that provides isolation. It doesn’t. The point is that a protection that can be quite effective when viewed exclusively from the perspective of its own layer can be easily rendered useless at a higher or lower layer if it is not coupled with additional controls.

Subnetting and Subnet Design

By themselves subnets provide very little control. Done properly they can provide slight advantages to other controls. More importantly, if done improperly, they can actually make it impossible to secure a system by drastically reducing the options of control available.

PCN’s should be on their own subnet. There is no technical reason for a PCN to co-reside on a subnet used for other purposes. They often do because it is difficult to get a new network set up specifically for use as a PCN and there is a cost associated with separating them but in my opinion the small additional cost and amount of work is trivial compared to the amount that not separating them increases the threat environment. This is true even for non-significant PCN’s.

This one might be a bit contentious but I am a fan of using private address spaces for PCN’s. It provides some control in that it limits the potential external accessibility (ok not much but even a little can help), it helps people keep the networks separate in their minds, it doesn’t significantly impact connectivity and it allows some obfuscation of the environment at least from certain perspectives. The only real drawback is that to access it remotely NAT might be necessary (of course I kinda see this as a plus).

Keep the subnets relatively small while allowing for growth. There is absolutely no reason I can think of for having a 248 or 240 mask. If the PCN is going to be that large it wouldn’t hurt to logically divide it anyway. Increased division can also help from a redundancy and reliability standpoint by facilitating the use of routing protocols for redundant paths vs. spanning tree. Use spanning tree only for close redundancies one or two hops at most (in my opinion not even then, I am really not a fan of spanning tree I see it as an attempt to inject layer 3 functions into an inherently layer 2 protocol suite, It’s only valid function is stopping loops not providing redundancy in my mind – sorry networking religious quirk of mine) use routing for anything more significant.

If you have a large enough site to require multiple subnets and you are using private addresses (or are lucky enough to have a huge public range and choose to ignore my advice to use private ranges anyway) chose subnet breakdowns that allow for easy masking for expansions or acquisitions. (Net ranges at 16, 32 or even 64 on a 10.). This is good advice for normal networking as well. I don’t know how many organizations I have seen paint themselves into a box with 10.1, 10.2, 10.3 schemes that prevented easy logical aggregation using the octets themselves without sucking up huge ranges.

Routing

With one exception (the Gulf of Mexico’s Deepwater Rigs) almost all PCN’s I have seen have been small enough that they are end subnets on any routing network. My only real comments on this one are why route it if you don’t need to and if you do route contain the gateways and paths to something you (or at least your organization) have control of.

MPLS hasn’t caused any significant problems that I have seen yet but it can be compromised from the provider side. This compromise is not limited to watching traffic. A friend of mine and I successfully did an injection attack by replacing labels in line using a perl script. We convinced “customer” network Alice that we were an address on “customer” network Bob and pinged addresses in Alice. This was in a lab environment so this is easier said than done but it is possible. The main reason I think this is significant is that in some nations access to the nodes of the provider network might not be as controlled as in others. Of course the same risk holds true for Frame Relay and ATM but the pool of potential hostiles that are knowledgeable enough to pull it off for those two is a lot smaller. I also trust the carrier networks less because I know that many of the MPLS networks are growths from the older and uncontrolled MIP days. Frame Relay and ATM networks were never used as direct IP ISP’s. (though they did carry them at a different layer) Plus MPLS is growing like a weed because it saves the carriers money and they can pass a bit on to the customers.

Anyway you’ve been warned.

Enough writing for now. I’ll do ACL’s Firewalls and NIPS/NIDS Thursday or Friday.

17 January, 2007

Data Link – Layer 2 Continued

Continued from Layer 2 See also Physical Layer – Layer 1

In the comments on the first half of the post, Dale from Digitalbond mentioned DNP3 as a layer two protocol implemented over Ethernet and correctly pointed out that Modbus IP was an application layer implementation of one of the communication protocols that ran on the older RS-232 links. Ron seconded this. There are a lot of other similar instances as well by many control vendors. They basically packetize simple direct connection communications often (always? as far as I can think) without any authentication. I can think of ones from Rockwell/ABB, Seimens and Honeywell off the top of my head. They were proprietary layer 2 communication protocols and to enable their easy use over IP networks a simple (usually very simple) packet based communication string was set up. Usually a bunch of checksums and CRC's are used to try to deal with the deviance from a deterministic network.

When layer two protocols run on layer two point ot point connections there is rarely much of an issue. Security can be handled as a physical access problem and the realistic threat pool is vanishingly small. Not to many people are willing to either separate your wires from thousands of others, climb to the top of a pole, or dig into a ditch to tap into a single link to a single RTU or PLC. Even if they were it wouldn't net much.

The real risks come from two different implementation mechanisms. Wireless deployments and efforts at implementing layer two protocols over layer 3+ designs and/or integrated with multipoint layer two mechanisms seem to present the most problems.

Now that I think about it wireless could probably be considered just another example of the last point.

A quick comment on DNP3 over IP and Ethernet from a networking standpoint (as opposed to a security standpoint, though this certainly fits with reliability and therefore availability). DNP3 uses a ton of CRC’s so it is pretty chatty from a collision domain standpoint. For smaller implementations this probably won’t show up but for larger sites and for networks that have multiple uses you will have a lot of collision storms if you either have older networking equipment (hubs) that aren't switched or you have a lot of nodes converging at a single point.

This symptom lead to one of the most common security mistakes I see made regarding a misunderstanding of layer 2 and 3 overlap. The “its ok you can’t sniff it because it is switched” response.

First of all in most cases I could care less if you can sniff most SCADA traffic (the whole AIC vs. CIA conversation). I do however care if you can interrupt traffic or worse yet insert invalid traffic (intentionally or not). On Ethernet it is a trivial exercise to do this.

So far I have been spending most of my time talking about wired communications but wireless has been around for a long, long time in the SCADA world. When used exclusively for telemetry it is mostly harmless. The one thing to be very careful of in a telemetry monitoring mode is that open loop decisions that are taken using suspect data are subject to initiating cascading failures. Decisions made remote from a site due to old or inaccurate data can easily lead to a chain of improper system and people responses.

My biggest concern with the wireless deployments and equipment I have seen recently is that they don’t seem to be learning from the mistakes in the IT world. 802.11 equipment is prevalent and it often is just used with default settings. There is a huge pool (many thousands) of people and devices specifically looking for openings in 802.11 networks. Even spread spectrum equipment is often implemented with default factory settings. This results in being able to connect to the back end networks without authentication by simply having the right equipment. Admittedly few people have this equipment but it isn’t difficult to get and is sometimes relatively cheap. Since the background connections for much of this equipment is an IP network it is often trivial to get on to the PCN (sometimes from a great distance away).

In summary SCADA controls for layer 2

For RS-232 or 485 the only real protection mechanisms is physical line security. There is an inherent risk mitigation for RS-232 due to the fact that it is only point to point. Even if you can easily tap (and interfere with) one of the connections realistically it is very difficult to affect the overall operation of the system because there are usually multiple nodes that provide correlating information and control. Unless all or most of those nodes are interfered with there is usually little risk of significant impact.

A single point tap to Ethernet and IP deployments provides access and control functionality to all nodes that are not specifically isolated on that network. This greatly increases the risk. Controls for Ethernet include MAC filters, NAC (not quite ready yet but emerging), VLAN’s, Port disabling/control, node level segmentation and dynamic monitoring and response.

Similar to Ethernet wireless implementations pose the potential risk of access to multiple nodes from a single access point. It is worse than Ethernet in that physical proximity is not essential for the compromise to take place. The easiest control for wireless is simply to not use it unless necessary. Unfortunately it is necessary (actually essential) in many, many instances. If possible, one of the most effective controls is to limit Wireless connections to a point to point model where it is not possible for any of the nodes to access the root communication network. Only the historian or system they need to report to should be accessible. If aggregation points are necessary use some means of authentication coupled with encryption. Avoid using factory defaults unless those defaults include strong node authentication. For 802.11 controls include WEP (for encryption [it makes it just slightly harder to connect and helps protect in other layers]), EAP (and variants LEAP and PEAP) , WAP and MAC filters.

Let me be clear here, I am not saying to not use these technologies. There is an enormous amount of value in using them and in many cases security is actually being improved when they are properly implemented. I am saying to use the controls that are appropriate for the level of safety or risk associated with the system the controls are on.

Digg This

15 January, 2007

Data Link – Layer 2

Second Part of Layers Physical Security – Layer 1

OK - Second Layer. In OSI it is the Data link layer with Collision Detect, Collision Prevent, Ethernet, Token, TDM, and all of the others. In a nutshell it is how the systems talk to each other on a point to point basis. When you are talking Ethernet and switching (especially spanning tree) you get overlap into layer 3.

There are a number of areas where it is significant from an information security standpoint for SCADA systems. In the last 10 years the conversation has become dominated by the Ethernet issues but there are other significant issues occurring as well particularly in the wireless realm.

RS-232 (now iea232) was the prominent linking mechanism for quite some time (defined in 1969). The PLC’s can play the part of either DTE or DCE depending on its function in the design. There are some huge advantages to the RS-232 usages. It supports deterministic timing meaning that actions and responses can be watched real time and reactions can be based on ladder logic layouts without much concern of a “lost” packet. It supports a sufficiently high data transfer rate for most automation processes and it has been well tested and used. RS-232 is falling somewhat out of favor as a connection mechanism in the automation world and largely being replaced by Ethernet for local connections. (boy that sentence is going to generate some hate mail) Although IP is really at the next layer it is part of this shift and in the rest of the networking world this shift happened over a decade ago. If you look at my older posts this synchs with my stand that the automation world lags the rest of the information systems cycle by two to three generations and 8 to 10 years.

There are some substantial security implications of this shift to Ethernet. First of all the shift has just started. Less than 20% of PCN’s are Ethernet but most of them (say 80 to 90%) have direct control connections to the Ethernet network via various aggregation tools/methods such as RSLinx. Ethernet, while very reliable if properly deployed, is definitely not deterministic. Multiple nodes exist on the same structure and they work on a modified collision detect structure. If one node is talking the others wait random periods of time to start. Switches mitigate a lot of this by separating the collision domains but when a destination node is receiving traffic from multiple sources there are still lost packets. This is largely overshadowed by significantly greater data transfer rates.

There are some very specific weaknesses to Ethernet that I am concerned about in the PCN world. The most prominent is ARP spoofing. Without getting into the details (I’ll save that for the follow on PDF’s I am starting to write) arp spoofing involves taking advantage of the way ethernet makes connections to allow one node to “pretend” it is another node. Although I have never seen personally, or even heard of a case of arp spoofing on a PCN the entire architecture would be very vulnerable to it. I think the biggest reason it hasn’t emerged yet is that there is no real need to do it at this point. If there is no authentication to a MODBUS IP node anyway why bother pretending you are from somewhere else. As ACL’s and in line firewalls increase in the prevalence I think the frequency of ARP attacks will increase. This could have a very significant impact on devices that are so fragile that they croak when a syn scan hits them.

Controls for the Data Link Layer are pretty simple. For Ethernet MAC filters (Recently in the form of NAC) and switch configuration shutdown of ports (which overlaps with Physical security) serve as a first layer.

NAC is emerging but still needs some development. What it really comes down to for NAC is that a device needs to talk to an end point to be authenticated in any way (let alone a fancy key exchange followed by certificate verification). Since it needs to talk it has to be given the opportunity to connect to the network. What this eventually evolves into is a means of quarantining a device in an “unauthenticated” VLAN until it is verified by some means. This inserts multiple points of opportunity to overcome the defenses. Any time the layers work against security instead of for it you can almost guarantee that someone will find a hole.

The NAC schemes that seem to be most likely to succeed involve Identification of the MAC as an accepted MAC by an authentication and verification that occurs in a quarantine VLAN. A lot of the schemes are using DHCP because it already has a means of differentiating based on MAC address but this has the weakness of not covering for static addresses. All of these NAC methods require upgrades or replacement of existing hardware for most implementations. Other NAC schemes involve searching for the bad guys and using some other mechanisms to expel them from the network.

VLANS are the next major control associated with layer 2 in the Ethernet environment. Basically they are a means of segmenting traffic into separate “networks” on the same devices. They can be set up using different mechanisms as the differentiators for which traffic belongs to which VLAN. The most common I have seen at sites is a simple port assignment. With this mechanism ports 1-6 are assigned to VLAN Bob, ports 7-12 to Alice and so on. Since each VLAN is a separate logical network they typically “cannot” talk to each other without a layer 3 connection. VLAN’s are often associated with a specific IP subnet (sorry layer three + again here).

The last part is the catch from a security perspective. Network Administrators and Engineers almost always assign a gateway for each that has no filter or ACL to prevent Bob from talking to Alice or worse yet Eve (at a completely different site) from mugging Bob. Just because they are on a different subnet does not mean they cannot talk to or interfere with each other. The problems for this don’t occur at layer 2 but when designing, operating or auditing you shouldn’t think that being on a different VLAN by itself is a protection. A further complication with VLANS set up via Port assignment is that there is often a VLAN used for management or troubleshooting that is assigned the entire port range (or at least overlaps other ranges). Any device that bridges these also serves as an entry point. It also serves to complicate the design.

VLANS can also be based on source MAC addresses, QoS classifications, IP addresses and other means but I haven’t seen that much of the more detailed assignment mechanisms in the ACS world. MAC address differentiators are sometimes used but have most of the same pitfalls of the port based VLANing. Some realistic NAC implementation mechanisms try to take advantage of MAC based VLANing to provide the quarantining I mentioned above.

Key point here. Just because it is on a different VLAN does not mean it is segregated. Try something simple. Ping it from one device to the other.

Already too much writing for the weekend. I’ll continue later this week.

Update:
Continued Here

Digg This

05 January, 2007

Physical Security - Layer 1

As I have said previously I tend to think of Security layers in terms of an expanded OSI model. This might be somewhat simplistic but it does provide an easy structure for a working defense in depth strategy. In many cases it also matches well to the domains, objectives and ISO categories. In areas where it deviates it often fills gaps rather than creating superfluous work.

Strictly speaking layer 1 deals with the standards for physical connections radio and wireless characteristics and timing and signaling mechanisms. I am not talking about the actual OSI layer I am just using it as a conceptual guideline.

Physical Security is one of the fundamental pieces of the information security structure and is essential for proper defense in depth. Physical Security requirements are recognized in ISO 17799 as a category, within CoBiT in multiple control objectives and in ISC2 as a domain. It is often one of the more difficult aspects to deal with. Direct control of Physical Security is often out of the hands of IT or Engineering (typically for good reasons). Wireless mechanisms complicate proper implementation of physical security by bypassing existing mechanisms of control. Finally many Physical Security best practices and needs fall outside of the actual scope of Data Security. All of these are standard complicating factors when dealing with Physical Security.

Within the Automated Control world the physical security becomes far more complicated in that it also includes aspects of safety. While many of these are issues that properly reside in the responsibility realm of the engineers and operators it is still essential that the people responsible for managing information security risk understand how they work. Though they are not directly part of the information security realm, often proper physical security and physical design parameters can mitigate much or even all of the risks presented by information systems ties. There are also some unique challenges to obtaining the typical requirements for physical security of information systems.

Perimeter Security, Controlled access, Manned monitoring and reception, Environmental Controls, Control of access to cables, Public Areas, Secure Disposal methods and Monitoring of support infrastructure fall within this realm in typical Information Security implementations. Within ACS deployments Fail Safes, interlocks, inherent physical characteristics, proper finite element analysis and redundant essential systems (three pumps) greatly reduce risk of issues in critical systems. These should be added to the standard list of physical concerns to understand for information security professionals that deal with SCADA systems. When properly implemented, these design criteria and mechanisms can alleviate many of the concerns that are often cited in information security risk profiles for SCADA or ACS.

Perimeter Security is the establishment of a clearly defined boundary with controls to ensure that only the proper people have access to the equipment and systems within. The typical perimeters are walls, fences, hedges, cages, and separate offices or buildings. To be effective they have to be combined with controlled access and manned monitoring. Wireless systems circumvent perimeter security mechanisms completely and therefore must have a differentiated access control mechanism instead. ACS and SCADA complications to perimeter security mainly deal with scale. Some oil fields span hundreds of square miles, Power Lines are ubiquitous and have many unmanned transformer and switching stations, water systems and pipelines go through towns, cities and neighborhoods and can stretch for thousands of miles. While remote pumping and transformer stations usually have perimeters they are rarely manned. For reasons that have nothing to do with IT security they are usually well monitored in the form of alarm systems and physical access barriers but often the incoming telecommunication systems are accessible outside of this perimeter. A mitigating factor to physical access risk that deviates from a standard IT environment is that many of these systems are so remote that it would be very difficult for someone who is not already "inside" to access them. The North Slope and offshore rigs come to mind. This mitigating factor should be considered but not always relied on.

Controlled access includes locks, gates, key card entries, and the reception lobbies. For wireless systems it includes the authentication mechanisms. All of the encryption in the world is useless if you have no means of authenticating access to the root system. This was the entire nature of the misunderstanding of WEP for 802.11 and all the problems that have stemmed from those mistakes. This same gross conceptual error also extends to the spread spectrum systems being deployed currently in many SCADA and PCN environments. Just because I am unable to intercept communications between a base station and a node does not mean that I cannot connect to that base station directly provided I have the right settings. Without some form of authentication it becomes a function of security by obscurity. All of the devices and networks become accessible (sometimes from up to 100 K away) with one mistake.

Eventually any physical barrier or controlled access mechanism can be bypassed. At this point manned monitoring becomes an essential piece of the physical controls. Typical monitoring mechanisms are direct manning, patrols, cameras, log reviews and equipment monitoring. The last piece is one of the greatest mitigating factors for good ACS security. Almost all operating machinery has an operator somewhere monitoring it or the system attached to it. By properly using/training these individuals a significant reduction of risk can be obtained. The presence of these operators is one of the significant advantages that many SCADA environments have over the typical office environment. In some other post I will discuss Segregation of Duties and how in many cases these operators are one of the most likely risks but for the purposes of enhancing physical security they are one of your best assets.

Interestingly enough Environmental systems are often one of the stealth ACS environments out there that almost every organization is dependant on. HVAC systems are essential for the proper operation of any data center and are more and more likely to be controlled by network accessible interfaces. It is also becoming increasingly common for power distribution panels to have standardized Ethernet accessible PLC's controlling them. Other than the realization that these systems are increasingly likely to be able to be hacked there is little to differentiate the physical environmental requirements of ACS vs. Standard IT systems. Redundant power, proper cooling and heating are all important. One thing for engineers to keep in mind is that many security systems such as firewalls, NIPS and switches are designed for a data center environment. They may not perform well in a shed that reaches 20 below zero. I have seen a firewall implementation mandated by information security have difficulties with MTBF for precisely this reason. Note to vendors - If you want to get into the SCADA market start designing more resilient equipment. A typical Ethernet switch placed 10 feet away from an operating paper machine rarely lasts long.

Control of access to cables can be very problematic in a PCN environment. When a network extends for miles there are any number of points where access can be obtained. Fortunately there is some mitigation in the form of departure from typical Ethernet connections (at least as long as it lasts). Most extended networks require some form of longer range layer two connectivity's. I will discuss these items somewhat in layer 2. Including the fiber runs within trenches or other relatively inaccessible paths can help further mitigate risks associated with this control but for large geographic areas there are definitely challenges. For facilities with defined areas it is worth ensuring that cables that cross public roads or areas are not easily accessible or are protected at another layer if it is unavoidable. A key problem I have seen with this is RJ-45 outlets to a PCN Ethernet segment without any identification of the network type or any way of controlling who plugs into it. This often occurs when an engineer thinks it is alright to put a PCN connection in a conference room (or office, or even home) that he commonly uses. While not absolutely essential complete physical separation (including switching infrastructure) of PCN from all other networks should be considered. If the system is safety essential, critical or "red line" such as ESD systems then complete physical separation should be considered essential.

For the IT people reading "Fail Safes" is the failure mode of specific equipment or systems. As an example valves fail in three modes, Open, Shut, or as is, with a loss of power. The engineers who design the system determine which failure mode provide the most safe environment for a given system and status. Interlocks ensure that when certain devices or systems are operating in a specific manner that other specific actions cannot happen.

From an information security standpoint an important aspect to consider is the dependence off failure modes and interlocks on programmable controllers. Ideally A fail safe position is a fail safe position and nothing can alter it. It is an inherent part of the system. The same should be true for interlock responses. The problem occurs usually when specific programmable settings are used to enact the fail safe or interlock and those settings can be altered. I have seen some problems with this in some ladder logic deployments (essentially a series of inter dependant switch positions). Because controllers are more likely to be remotely configurable it is more common to see interlock settings and fail safes that can be alterable without the knowledge of the operators or engineers. This is one reason that control of physical access to the PCN (and by extension the PLC's) is so important. The flip side of all of this is that if the fail safes, interlocks and other inherent design considerations are done well it is very difficult for any failure mode to cause any significant issues. In a well designed system three or more sequential failures (at least one of which should be a physical property of the system) must occur before safety is compromised.

I couldn't tell you how many times I have sat in a room with an Information Security professional talking with Engineers and the IT guy states that one of the risks include fires or explosions. The engineers usually just roll their eyes. The fact of the matter is that in a well designed system even if an operator with complete access to the systems forcibly does things wrong it is usually very difficult to force a catastrophic failure. Of course I have also seen the reverse of this happen. If the failsafe is dependant on the proper operation of a PLC and that PLC configuration becomes suspect then that failsafe is no longer dependable. When an engineer learns of this the response is often a great deal of concern.

Digg this story

20 December, 2006

Measurable Layers

Prediction #9 seems to have gotten at least some attention. I have had three separate requests (one in two parts so I am certain the author is interested) for expansion, and clarification.

I guess this is good. I obviously tapped into a healthy meme seed but I do have a bit of a dilemma.

I am not really sure what I meant.

Well I am sure what I meant but I am not really sure how to articulate it. (wow doesn't that leave me an easy out in 08)

Every attempt I have made turns out to just be a small part of the whole. It is like trying to draw a hypercube on a piece of paper. All I wind up with is a bunch of weird looking triangles, rectangles and squares.

The way I used to look at security was as a sort of modified OSI model. (way back when)

Control Physical Access
Locked TC and DC doors, Building Access, Wireless Access Controls

Control Switching/Electrical Access
More Wireless access controls, Mac Filtering, NAC (if it ever works), VLAN’s

Control Routed Access
ACL’s, Good Subnetting (Yes I know a subnet doesn’t stop anything by itself, but if you don’t get the routing right everything else is harder), Proper DMZ/Extranet/Segmentation

Control of Application Connectivity
Firewalls, Tunnels, Some Proxy Functions,

Control of Sessions and early SoD
Session Segregation, Basic SoD, Identity Controls

Control of Data access and Presentation
Db Controls, Site/share/page access, More Identity Controls, Middle SoD

Application Controls and Control of data manipulation and metadata
Business SoD, Application Design, Business use of Application, More Identity Controls

This approach actually still works in many cases but it lacks a lot of essentials. It is almost purely tactical and has no self awareness. It also focuses too much on access control/preventative controls and not enough on mitigation and prioritization.

A lot of people who talked about the OSI model used to jokingly add a few more layers.

Politics, Religion, and Money

I am not so sure that is a bad idea but I would probably add a few more layers and call them:

Process, Policy, Governance, Compliance, and Money

in that order.

If you do that combined with the other layers it looks a bit like ISO 17799 domains doesn’t it? Well perhaps with some CoBiT Control Objectives thrown in.

There are a few differences though. Instead of interrelated overlapping domains you have sequential (potentially superseding) layers in both directions. These are layers where (for a given threat) you can show a certain level of protection. Multiple layers can be stacked for increasing sequential protection versus a threat from a given vector.

So let’s add these into the mix. Do they overcome the shortcomings? Well not completely. There is one thing still missing, visibility.

So feed visibility as a subset requirement into each of the layers.

As a quick example of that meme:

A firewall is valuable because it stops some attacks

If you are able to see how many attacks occur “outside” the firewall and compare them with how many attacks make it “inside” the firewall you have added value. The value isn’t directly added to the control that is the firewall. The value is added at the Process layer where an evaluation of the effectiveness of the firewall occurs and other controls can be used to mitigate the identified weaknesses. It might also be added at the Compliance layer where an organization might have to meet PCI requirements on proof of effectiveness of controls (specifically the firewall as a Control).

So what I was trying to say when I wrote:

“Vendors that are able to encompass the concept of measurable layers in security will emerge (or in the case of the few that are already out there will do well financially)”

Is that vendors that are able to add or combine either automated or easy to implement means of measuring effectiveness of the controls they peddle will add value.

Also

Vendors that facilitate the process of not only tying controls to specific effectiveness but also representing the effect of overlapping controls on overall risk mitigation will add a great deal of value.

If you can demonstrably add value then you can make money.

That’s what I meant …

Sort of

So now I am circling around to tag the originator of the chain letter.

19 December, 2006

Security Blog Chain Letter - Tagged

Mike at Episteme just told me that I haven't responded to his Security Blog Chain letter fast enough so I suppose that now my hair is going to fall out and my kids will cost me a fortune. (well with six kids you can count on the last one regardless of ignoring a chain letter)

I think I will have a little fun with this one.

:)

10 Predictions

1. 30 percent of the predictions we make will be flat out wrong but we will conveniently forget that we made them. (or better yet read them in a way that makes them seem prescient anyway)

2. The only reason we do better than random on the accuracy of the predictions is because some of the items are so easy to foresee that my 13 year old pointed them out two years ago.

3. Something bad will happen in the next year.

4. Some good things will happen next year.

5. After pointing out only the items we were right on we will congratulate ourselves then make another series of lists next year.

ok now that the obligatory curmudgeonous has been done the next five will be a bit more in line with the intent

6. There will be one or more worms released targeting SCADA systems specifically and using vulnerabilities specific to them. Expect them to effect both Historians and some PLC's.

7. There will be several fairly significant outages related to SCADA security failures but they won't be publicly identified as such. Possibly even a huge one. (left myself some leeway on that one didn't I)

8. Organizations (regardless of the type) that downplayed or reduced the capability of their Information Security teams will pay significantly in terms of incidents, stupid and improperly configured controls, and lost opportunities. (Most of them won't admit it though)

9. Vendors that are able to encompass the concept of measurable layers in security will emerge (or in the case of the few that are already out there do well financially)

10. Improperly performed vulnerability scans on control systems will get several people fired (or close to it) They might even be related to #7. - This one is for you CNI Operator

Oh Yea # 11

11. My Kids will cost me a lot of money but be worth every penny.

I'll Tag Digitabond now. Give us your predictions Dale or your hair will fall out and you will be forced to rely on blog marketed consulting gigs for income. (oh wait)

:)

14 December, 2006

SOX Compliance and Crumpled 3x5's

Digg This

When I was a wee lad I had to take part in a management training meeting. Of the week I was there I got only one thing of value (unless you count the pleasant and far too expensive stay at the Times Square Marriott and subsequent New York restaurant visits).

We did an exercise.

They divided the class up into about 10 groups of 4 to 5 people. They gave each group a bunch of 3x5 cards a few rolls of cellophane tape and a stapler with a bunch of staples (too bad it wasn't a Red Swingline) .

We had 3 minutes to plan then at the end of that time we had 2 minutes to build a 5 foot tall tower with our resources.

My team spent the three minutes dividing ourselves into an organized, highly efficient 3x5 card block creation assembly line and readying the floor space.

When the stopwatch started we started stapling the cards into small triangular blocks like good little assembly line workers. We made hundreds of them and passed them to our teammates who dutifully organized and stacked them. The leader circled the tower applying tape to hold the layers together. We were incredibly efficient, hard working, we paid attention to every detail and were ultimately unsuccessful.

Our tower got to be almost 3 feet high when they rang the bell. It was a pretty tower and we worked hard on it, but in the end it fell short of the 5 feet goal by close to half.

Two of the groups did succeed.

One of them strung out long strips of tape and slapped the 3x5 cards to them length wise. They crumpled these into three tubes then taped them together at the top. It only took them about 30 seconds to finish.

The second group had everyone on the team watch each of the other groups. When they saw the tube guys they imitated them. I think they probably finished in about a minute. Their strategy was obviously to imitate a successful strategy. After all the goal wasn't to be first it was just to get over 5 feet in less than 2 minutes.

When I first got involved in SOX compliance pieces, specifically the attestation process I felt either like the stapling person or (when I was in charge) like the group leader running around with the tape trying to hold the far too small (but very pretty and neatly organized) tower together.

Since then I have been through three successful audits at two different companies. One of which I helped manage.

Our most successful portions were done by crumpling the cards and taping them together at the top.

I am going to try to describe what went well over the next few weeks, so if you are having headaches stacking cards perhaps you can play the part of the second team instead.

13 December, 2006

Save the Users - or - Help Me Help You - CI4

Crazy Idea #4 - Potential new revenue stream for ISP's. - Digg this

About 6 to 10 years ago ( I can't remember exactly when but suppose it was about the time of code red or NIMDA) I was staring at a pile of papers on my desk. They were a dump of that months syslog and were about 6 inches high. The log for the previous month was in my hand and was only two pages long.

We had set up a pretty useful system for tracking down people that were trying to hack into our company. Our Internet facing Cisco router served as the first layer of defense. There was an ACL that watched incoming traffic and dumped all but a few ports. For HTML we got fancy and looked for some rudimentary "signatures" (about 40-50 of them) that caught things like unicode attacks and a few other items. Next in line was a SNORT box. They would log these events then forward them to a DMZ syslog behind the firewall. We also forwarded our Checkpoint firewall (which was the next line of defense after SNORT) logs to that box.

I had some Greps cron'ed to run periodically and forward their results to our SMTP server using a little mail script I wrote. HELO, MAIL TO, MAIL FROM, DATA, egrep, EHLO. We had some Network General Sniffers that alarmed for certain specific types of traffic (mostly stuff that looked like scans) and forwarded an email to the same address. The system worked really well and had for several years. We would have about 2 or three false alarms a week and just a few real ones a year. We even managed to track a few of them down and got involved with authorities in the country they were in. (two convictions, one promotion [he worked for us in another country and was trying to fix things])

It all changed overnight.

Pretty much everybody reading this blog is a security professional that went through this. (or possibly a controls engineer that I suspect is about to go through it. Remember 8 to 10 year lag)

It started with the large scale automated scans. Usually some idiot that had gotten hold of SATAN, SAINT or an early ping sweep utility and didn't know how to use it right. (honestly these started several years before) They were irritating but you could filter them in your greps. Early versions of Nessus and other versions of NMAP and HPING were more irritating because they were harder to filter and the ACL would miss chunks of them.

Then the worms ate into our brain.

Within a month or two those of us that had set up automated detection mechanisms were buried under an indecipherable morass of logs. Since then we as an industry have gotten a lot better at designing filters and managing the information chaos. Through a combination of layers, good design, luck and major initiatives by IT vendors we have somehow gotten to an acceptable equilibrium with the worms (at least for now) but the root problem has never rally been solved.

Staring at that pile of paper I had an idea. The only people who could fix this was the users and the only organizations that could help them were the ISP's. The ISP's could help their users and make money at it at the same time.

I have dropped this idea for almost three years because ISP's started to give away AV for free but recent events have revived it for me.

It is pretty simple really. The ISP (or someone hired by them) watches for suspect traffic from their address ranges. If they see hints of it they watch that address closer. If it is verified that the machine is acting improperly they use their systems to tie the address to a user and then an email. They all have the data just in different formats it might be RADIUS, MAC registrations, Mail logins, Cable modem registrations or just access logs.

They then send a email to the user informing them that there is probably a security problem on one of their systems. If they go to this web site (linked in the email) and follow the instructions it can be cleaned for free. For a simple fee of $5 a month (added to their existing bill) they can be added to the premium security service that will help to maintain their system in a clean state. For $10 a month they can be added to the platinum service that includes additional services and advanced protections.

Think of it. It is targeted marketing to someone who definitely has a need. Probably someone who is ignorant of the product and industry but has been barraged with mainstream news panic stories so is primed to react.

The first objection I usually hear is "why would they open the mail, They'll think it's spam"

Hello!!! They are infected by a trojan or worm so they obviously don't have that great of a brain-email-spam-phishing filter to begin with. Plus the carriers never need to ask for credit cards or other information. They build trust with a well developed mail and clearly branded site. If they want to be careful they can verify any orders out of band. Any info security people I plugged this with years ago looked at it with a paranoid eye.

The user doesn't.

They are link lemmings.

Besides it is certainly possible for problem accounts to send an actual snail mail.

Next objection - Exploratory Cost

It would be somewhat different for every ISP but most of the time the start up system would be very easy and inexpensive. You need some kind of Honeypot or IDS to catch the bad traffic. Chances are it already exists. You need to write a simple app to verify what traffic is actually bad. An app to link addresses to users. A site with a web based AV and spyware scan (honestly just use the company that is already being given away free). And an email app. If it makes money from the start up design then expand it to meet the needs/demand. Most ISP's already have these pieces they just need to develop the offering. At the very least it would defer some of the AV costs at the most a tidy profit center in the long run.

Next Objection - Why not do it for free

Because it doesn't have to be free. Oh the ISP's should still offer the free AV items but if a user isn't savy enough to use it then they might like a premium service that take the brain work out of it. A simple agent (uh oh I said the A word) to make sure that the AV and anti Spyware apps are up to date and working well could do. For the premium service they might throw in shredding apps, child filters, weekly security popup tip (that can be turned off of course), utilities (semi optimized) and/or periodic human verification. Pick and choose the mix to compete with the other guys. Obviously the Free AV approach isn't working that well any more.

Next Objection - Invasion of privacy!!!

First they are already watching this traffic for troubleshooting and incident response anyway so at the most this will bring it to the users attention (which is arguably a laudable goal in itself). Second it is entirely possible to set this up using only a honeypot that has no other uses and doesn't originate connections. If they don't come to you then you don't look at their traffic. There would still be plenty of opportunities.

The ISP's make more money, the users have more secure systems, the rest of us have a slightly improved security environment at least until the next gen of the battle. Everyone wins but the illegal spammers and worms.

Just another crazy idea.

06 December, 2006

Bittorrent - and true virtualization - Crazy Idea?

Yesterday in my MPAA post I was a bit harsh on Bittorrent.

A number of religious adherents jumped to its defense.

My reply was basically, yes it can be used for good a tool is not in and of itself evil, but lets be honest it usually is insecure and used for less than reputable things.

They do have some good backing. They managed to Raise Money.

This is probably because they do have very innovative mechanisms of transferring data.

Something that I think would be interesting to see is a combination of Grid computing with file storage and transfer mechanisms wrapped with security layers that are easy for the end user to configure and easy for the user of the grid to use to protect their data.

26 October, 2006

Layers – 100% compliance – Final of 4

Part 1 here Part 2 here Part 3

Mike Rothman Chimed in on the 100% compliance piece and did a far neater and faster summary of what I was trying to say.

This part brings me full circle to the original conversation on Risk Units and some of the differences between risk management and best practices.

Essentially best practices is a bunch of smart (hopefully) guys sitting around in Gartner, Forester, D&T, PWC, E&Y, SANS, and other groups coming to a consensus on which controls cover the closest to 100% for a given threat they are looking at and which are the best controls to put in place.

(yes yes I know this is going to be an avalanche of what about this or that group)

This is great. It gives us a outside look at how various actions and tools compare to each other to help prevent problems but it doesn’t factor in all of the variables that each company and organization have.

It establishes a solid baseline and goals.

Coming up with best practices by definition includes dealing with the vendor marketing apparatus and all the fluff therein. It also is heavily based on the current trends, hype cycles, and opinions of what is really at issue.

In some companies a given best practice is just not possible because of political, environmental, architectural, economic or any number of other reasons. This is why it is more important to focus on what the real risk of an issue is.

There are number of questions I like to keep in mind when looking at the effectiveness and appropriateness of controls being considered.

What threats does a control provide protection from and how?

Are there overlaps with other controls and for which threats/vulnerabilities?

In a perfect environment how much protection from any given threat does a control provide?

How much coverage can I afford to get with the given control?

How much does the control interfere with existing work?

How much does the control interfere with changes and limit future flexibility?

I would love to hear others.

This is why I am so interested in the “Units” and math that might be associated with it. I picture a type of finite element analysis that can be applied to Information Security controls.

(and before any structural engineers start laughing at me yes I know it is not the same. I am using it as an example not as a literal mathematical equivalency)

Even if we could come up with detailed equations for this stuff I realize most of the time they wouldn’t be used. I wouldn’t expect them to. When I was a Reactor Operator I didn’t do all of the full equations for every variable for every shim or pump switch. I did however have a thorough understanding of them and because of that knew exactly what would happen before I did it.

A different series of questions might be what are the disparate pieces that make up control? How do they interrelate? How do they fit into the greater piece of impact times likelihood?

Update:

What I really want to know is if Batting Avg. or On base % is going to get me more scores in the end.

Update:

Securosis Does a better job at describing the "Best Practices" process.

I love this quote from it.

"Analyst best practices will make you really fracking secure, but probably cost more than a CEOs parachute and aren’t always politically correct."

I am very aware of how the process are worked so my hopes aren't dashed. His points are valid and more descriptive but that level of detail wasn't essential for the point.

Still more detail (and more accuracy as well) is better so thanks for the critique.

25 October, 2006

Layering Controls –100% compliance - 3

Part 3

Part 1 here

Part 2 here

So if you can’t get 100% with a single control how do you get 100% or close to it?

I’ll use worms as the example because it is easy not because I think they are the most likely current threat.

If you can stop 80% of the worms with your companies external firewall.

Then stop 80% of the remaining worms with segmentation to your PCN.

Then stop 80% with a NIPS device

Then stop 80% of the remaining with a Host based firewall

Then 80% with patching

Then 80% with HIPS

Then 80% with Memory Based Protection

Etc…

If you can get an 80% reduction with each layer then you have reached your .001% likelihood layer with 6 controls even if you had a 100% certainty of the threat event occurring to begin with.

So the trick is identifying the applicable controls, determining how they (and how much they) reduce the likelihood, and if they can be layered with outer controls.

This is why I have been so interested lately with the risk conversations at RiskAnalys and Episteme.

If we can identify a relationship with the units of risk to controls that would be very valuable.

Final Section Here

24 October, 2006

Layering Controls –100% compliance can’t happen - 2

Part 2 - Part 1 Here

It is much more expensive to try to get a control to be 100% effective. Things have to be designed around, more manpower has to be dedicated to policing the solution, and the solution is as or more likely to cause a loss of availability than what is being protected from.

As an example a colleague of mine designed a hyper redundant Ethernet network to “ensure” connectivity to a particularly demanding user group. He used Spanning tree as the mechanism. Any networking guys reading already know what happened. Long story short they had far more frequent and complete outages thanks to the redundancies than due to equipment failure. (btw if spanning tree is used properly it isn’t a problem) Constant route reconvergence caused low level problems and any time there was a minor change to the network the entire thing would crash. This caused far more frequent and complete outages than the MTBF for the switches would have indicated if there was only one path to each location.

Update:

Part 3

23 October, 2006

Layering Controls –100% compliance can’t happen - 1

I think one of the most difficult things to do on the security side is to determine the true need for, then subsequent effectiveness of controls. Part of the problem is that there are hundreds of forms that a control can take and hundreds of ways to implement them. Layering controls is essential but it has to be done in a methodical manner.

To protect from worms on a system you have a lot of options. None of them are 100% effective. But many of them are 80% to 90% effective.

I am not an advocate of complete physical separation. The reason is simple. The organization that separates the system usually assumes that the solution is 100% effective. The reality is that someone some time is going to connect into them.

An organization I was in contact with several years ago did a great job separating their network. They had loads of documentation, did scans and had clear policies and standards associated with their requirements. When blaster broke out their business systems were pretty much unaffected. A week into the outbreak a contractor hired to maintain their DCS got his mac address approved through the proper channels and plugged into one of the isolated networks to monitor settings. Twelve hours later (and much lost production) they managed to get it cleaned up.

In this scenario the problem was that the separation actually made it more difficult to keep AV and patches up to date.

Update:

A quick clarification here for the non SCADA security folks. The "isolated" networks approach is still heavily advocated in some areas of the DCS world and many vendors default approach is "just don't connect it to anything". Like IT and IS in the early '90s they think they can be safe if they just don't connect. Many haven't realized that it isn't possible to totally isolate anymore. That said isolation is a control, just not a very realistic one.

Goto Part 2 here

14 July, 2006

Layers

In Depth Protection with Multi Layered System Defense – Jim C

Introduction

From an infrastructure standpoint most organizations rely upon two key defenses to ensure the protection of their essential systems. Solid edge protection using firewalls and updated patching/antivirus form the root of these two key defenses. More advanced organizations have developed elaborate and comprehensive procedural elements to optimize the effectiveness of these protections. Internal firewalls have been implemented to help further protect critical assets such as PCN’s and key datacenters. This protection is essential but unfortunately there are inherent weaknesses in both firewalls and standard patching/AV signature deployment mechanisms that prevent even the most comprehensive programs from being totally effective. Adequate protection against threats in the current environment requires both in depth network protections and multi layered system protections in order to succeed.

Vulnerabilities

Firewalls cannot easily block traffic that goes to legitimate functions (TCP or UDP ports) and most standard deployments are unable to effectively analyze the content of packets to determine its probable impact on the protected end systems. This has resulted in the proliferation of exploits and subsequent worms to take advantage of this weakness such as Sasser (the LSASS buffer Overflow port 1443) and Nachia (the RPC DCOM buffer overflow port 135 or 445). The underlying vulnerability to the exploits expose systems to both worms and difficult to detect (without and IDS) hacking. Since the vulnerable ports are ones that need to be used for normal business transactions it is impossible to block them. Even firewall systems that provide comprehensive packet inspection are often only point solutions and are unable to dynamically adjust the network to an attack. Identifying the problem is only one piece of the solution it is still necessary to stop the subsequent attack.

Comprehensive patching and antivirus signature updates is the most effective way of dealing with the system level vulnerabilities that a firewall is unable to address. In a perfect environment all machines will be 100% patched with the most recent fixes and therefore not vulnerable to remote exploitation. In the real world 100% patching is impossible. Comprehensive AV signature updates is more achievable but 100% coverage in the few hours it might take for a malicious worm to spread after an exploit revelation is still a near impossibility. This is due to several items. Central IT very rarely has direct access to all the machines that connect to the organizations network within 2 to 4 hours of a vulnerability announcement. Some systems are traveling, some are behind firewalls or other protection mechanisms and many machines are not accessible to a central IT organization at all. Inaccessibility might be due to the machine being owned by another entity such as contracting organizations or due to rigorous change control requirements and inability (and/or unwillingness) to allow changes such as the ones that exist on many PCN systems. Even when the central IT organization has direct control of the systems it often requires days or weeks to deploy a patch and newer systems may not be appropriately configured to receive the patch. These elements combine to result in a deficit of coverage that is between 5% of the systems (in very well administered patching environments) to greater than 25% of the systems (in less well administered patching environments).

Risks

When combined, these fundamental flaws result in a significant breakdown of the overall information security of the company as a whole. The resulting risks are significant.

On the confidentiality side the aggregate risk of even 5% of potentially accessible machines being exploitable results in the ability for external hackers to “daisy chain” attacked systems together to effectively bypass many protections such as firewalls. The existence of a significant number of machines that can be compromised at a root level that also contain databases (such as with the LSASS exploit and Sasser) means that this data is readily accessible with minimal effort. Gathering of system information from these machines allows other machines (that are not vulnerable to the original exploit) to also be compromised. The net effect of this is that without other protection mechanisms in place most or even all of an organizations data is open to outside entities that want to make the effort of retrieving it.

Availability impact due to the aggregate risks identified is primarily due to system or network loss due to worms and other viruses. In many organizations this is easily measured by analyzing the impact of previous infections. This should be modified by two factors. The first is that the time between worm release and vulnerability announcement has been shrinking and therefore the risk of occurrence without the ability to respond is increasing. The second modification is that few if any of the worms of the last several years have had an intentional payload. This means that we have not seen anything approaching a worst case scenario. A reasonable scenario would be the unavailability of 10% to 50% of the entire IT infrastructure for up to one week with possible complete unavailability for greater than one day and indefinite data loss of much of the backup gap data. Any information on systems that are without backup processes (such as laptops and user desktops) has the potential of being irrevocably lost for a significant percentage of systems on the entire network.

Integrity risk due to the identified systemic flaws is slightly less catastrophic than the previous two risks. It is primarily due to intentional data manipulation by an undesired source via the mechanisms identified in the confidentiality risk section. Presumably this would be mitigated by working business processes that would identify problems before they reached material levels. In organizations that have a high level of non compliance on patching it is possible that this type of manipulation could be hidden. Organizations that rely on trust and non automated detective controls rather than systemic segregation of duties are more exposed to this risk.

Based on the aggregate of these identified risks it is conservative to place a value of up to 2% of an organizations yearly output at risk with a fairly high likelihood of occurrence. Many organizations have had significant identified outages due to worms and hacking events in the last year. Several have lost more than a full work week for the entire organization. Many have incurred reputation damage and a few have been subject to regulatory sanctions.

Frequent and recurring virus outbreaks highlight the existence of fundamental flaws that might also be avenues of exploitation by other security risks. They indicate a higher level of risk than would exist in an organization without frequent issues.

Solutions

In order to cost effectively protect against the different threats that are prevalent in the working networked environment today it is necessary to defend at multiple locations in the network (in depth protection) as well as multiple layers on a single system (layered defense). Achieving in depth protection relies on existing strategies such as fire-walling and connection authentication as well as newer mechanisms such as Network Intrusion Prevention Systems (NIPS). Likewise protection of systems at multiple layers combines older protections such as access control, patching and Antivirus with newer (relatively) strategies such as centrally controlled host based fire-walling, memory protection and behavioral restrictions that can loosely be identified together as Host Intrusion Prevention Systems (HIPS).

By placing NIPS at key locations it is possible to effectively segregate potential weaknesses in the architecture and to ensure that worst case scenario infections are contained. With careful location selection for the NIPS units it is possible to use the rule of two to dramatically decrease overall exposure with relatively low cost. Simply put, a properly located NIPS can reduce the number of vulnerable systems to a given exposure by up to half (usually substantially less) of the total existing machines. The actual number will be lower than half for each NIPS due to unequal distribution of systems within the overall networks and in most cases multiple access paths.

Unfortunately in order to reduce the risk to below the area of materiality it is often essential to protect individual (or small clusters) of systems. This is where it is more cost effective to deploy HIPS. By comprehensively protecting key systems such as financial application servers and key process control systems using HIPS and placing them on highly controlled and fully protected networks (all systems on the subnet have HIPS installed) total potential loss is limited to outages due to network congestion from systems constrained to small geographic or business regions (by the NIPS infrastructure). Point solutions can be flexible and site specific based on needs and still provide comprehensive protection.

Summary

An organization is at risk of significant/material loss due to catastrophic virus infection and/or undetected malicious activity without action. Existing mechanisms to deal with these threats are helpful and should be supported and expanded but are unable to effectively mitigate the total risks due to inherent weaknesses. Most organizations have already incurred losses repeatedly due to the exposure to this risk. There are architectures and processes available that are able to effectively mitigate these risks. Orgqanizations should investigate these mechanisms, determine the design most appropriate to them and if the cost is commensurate with the loss and risk of potential losses implement the systems.

13 July, 2006

A vision of an Ideal Process Security Environment

What the Operator should have to do

Install preconfigured networking hardware
Install Primary DCS server
Install USB device provided by vendor
Follow wizard to generate keys
Lock USB device away just in case
Follow Wizard to identify Networking hardware and other key settings/trusts
If desired integrate to MOC process/software for desired level of control
Physically Install new PLC’s
Goto Configuration screen and accept the PLC’s individually
Discover devices on legacy PCN and accept them into the system
Operate/engineer as normal

PLC’s/Controllers
PLC’s have default communication access mechanisms to ensure that they receive commands from the proper locations.

Asymmetric key pair (very likely to hard to administer but still ideal)
Installed in the factory
Public key accessible to the purchaser probably within the historian or DCS server via licensing
Keys can be changed and updated via appropriate DCS server on initial configuration and afterwards as needed
This is used as an authentication mechanism to ensure that they do not communicate with any other systems. They use SSH or another tunnel to communicate with each other and with the DCS servers to ensure they are not easily subject to redirect attacks.
Host level firewall is configured to allow the plc to receive and send communications only in specific ways. All other traffic is dropped without response. This does not need to actually be a firewall it could be done with a customized stack that only allows specific communications.
Integrated SNMP (V3), Syslog or similar capabilities for logging and alerts configurable via authenticated trusted source and preconfigured to supply security data to a remote point
Log ties changes to authentication source and authority
Failsafe settings can (but don’t have to) require local physical action to change

DCS servers

DCS servers (whether they are historians or more) have multiple layers of protection all of which have approved (and specifically defined) configurations by the applicable vendors.
A host based firewall (HFW)
Integrated communication authentication capabilities tied to the key structure used in the PLC’s and elsewhere in the architecture.
Integrated signature based IPS capability in the HFW with signatures driven from a trusted authenticated source.
Approved AV software with specific recommendation on DAT update mechanisms that are consistent with specific AV vendor methodologies
Behavior based IPS with DCS vendor approved configuration
Memory Protection/Control
Integrated management architecture
Release management capabilities for servers, all software on them and for associated Controllers
MOC (management of change) mechanisms with coordinated approval levels for changes on the server, for software and for controllers
Might (should?) be integrated with AV and IPS update architecture
Primary/Secondary DCS security servers
The primary DCS server serves as the center of the key architecture for the PLC’s and a security aggregation point for interfacing with external security and authentication
Security functions should be on the normal central DCS server
Capable of redundant configurations
Defined trust structure that will allow integration

Network

The network is divided in to several segments.

Firewall (or firewall IOS) controls access to all segments
Statefull packet inspection
Signature based NIPS capability
Secure Remote Monitoring and update capability
Dynamic redundancy capability
Power
Devices (HA, VRRP, HSRP) load sharing not strictly necessary
Availability biased failure ability for interfaces
Industrialized/Static safe
DCS vendor provides specific configurations for integration to their security architecture
NAC (or similar mechanism) used to control access to each segment
NAC splits the segments into two separate VLAN’s Trusted and Untrusted
Trusted VLAN is home to configured authenticated systems (using the key structure to provide an automated authentication)
Untrusted PCN has all traffic routed to an initial configuration DCS security server
(Optional) Default untrusted network for devices that connect that do not even have a manufactures key or similar capability but still have direct
PIN Network (PCN DMZ)
Serves as home to the Historians and other DCS servers with Open loop controlling functions or serving as data aggregation points for external feeds and monitoring
Provides neutral zone between vendors
Provides interface capability to control functions
PCN Network
Home to PLC’s and DCS servers with closed loop controlling functions
Authentication for communication via NAC with the key architecture providing access authentication
NAC splits the PCN into two separate VLAN’s Trusted and Untrusted PCN
Trusted VLAN is home to configured PLC’s and systems
Untrusted PCN has all traffic routed to an initial configuration DCS server
(Optional) Default untrusted network for devices that connect that do not even have a manufactures key or similar capability but still have direct control functionality
Separate PCN's Possible for Redline (highly critical or Safety essential) systems
ESD Network
Used as protected network for Emergency Shutdown PLC’s and associated servers/services
Very tightly controlled access
All changes logged, documented and tied to an engineering authority
Home of the key fail safe mechanisms
(Optional) Monitoring Network
Home of controllers that have monitoring only capability and do not participate in closed loop controlling functions
Servers that provide outgoing data for troubleshooting and performance management
(Optional) Utility Network
Home to support server and systems that need integration with DCS systems but serve no actual control functionality
(Optional) Legacy Network

How it could work

The organization installs and configures the networking equipment in accordance with DCS vendor recommendations leaving a legacy LAN (or LANs) for existing equipment. The Primary DCS security server is installed and configured with the organization providing (or generating) its top level key pair (and backing it up securely). Network authentication is configured to the server. New controllers are connected to the PCN or Monitoring network. They try to authenticate to the network and either succeed based on preconfigured factory keys or fail and are routed to a secure server that will use the vendor default key to tell them they need to update their key pairs to ones provided by the Primary DCS security server. This could be automated or the new devices could show up in "unidentified" list that requires an operator to permit key distribution. The configured and identified controllers send/stream log data to the DCS security server along with their normal traffic. If the controller does not have the capability to handle a key its MAC is used to assign it to a legacy PCN and allow future access from that separate controlled VLAN. Controller software and possibly firmware updates are periodically checked and updated (after engineering authority approval) from the Primary DCS server. Trust relationships are strictly controlled and limited to information access in default settings. All setting changes are logged. All setting changes can be configured to require a vote for permission from the system authority. Different levels of change capability for operators, administrators and for MOC approval.