Quick start: ICAP server. Current trends in content filtering

For correct integration of the system, it is also necessary to configure the organization's proxy server. A general configuration requirement is that the IP address of the SecureTower ICAP server must be configured on the proxy server. To do this, the ICAP module of the proxy server must be configured in such a way that the header of the request sent to the ICAP server includes the X-Client-IP field containing the user's IP address. Requests without the specified IP address will be accepted but will not be served by the ICAP server.

Among others, SecureTower supports integration with the most popular SQUID and MS Forefront proxy servers.

SQUID

The SecureTower system supports SQUID versions older than 3.0. When installing/compiling the proxy server, you must activate the option to enable ICAP support and specify the following options in the ICAP settings:

icap_enable on
icap_send_client_ip on - client IP address
icap_service_req service_reqmod_precache 0 icap://192.168.45.1:1344/reqmod, where 192.168.45.1 is the IP address of the SecureTower ICAP server
adaptation_access service_req allow all

MS Forefront

To work in networks organized on the basis of the TMG Forefront proxy server, you must additionally install the ICAP plug-in, because by default, ICAP is not supported by this proxy. The plugin is available at http://www.collectivesoftware.com/solutions/content-filtering/icapclient.

In the settings of the ICAP plugin, you need to specify the address of the SecureTower ICAP server. As a result, all data transferred to the HTTP(S) protocol through the MS Forefront proxy server will be stored by the SecureTower ICAP server.

Minimum system requirements for ICAP server

Processor: 2 GHz or higher, 2 cores or more
Network adapter: 100Mbps/1Gbps
RAM: at least 6 GB
Hard drive: 100 GB partition for operating system and SecureTower files; the second partition for storing intercepted data at the rate of 1.5 GB of data from each monitored user per month plus 3% of the intercepted data volume for search index files
Windows .Net Framework: 4.7 and above
Operating system: Microsoft Windows Server 2008R2/2012/2016 x64

Administration

I took part in beta testing of Dr.Web's icap daemon, and was satisfied with it (despite some problems that have not been resolved at the moment), but the financial side of the issue severely limits me, so once again my choice fell on ClamAV.

Using Squid with ClamAV and c-icap to scan web traffic for viruses

background

You don't need a running clamd daemon to work, so you can safely skip configuring it (clamd.conf) if you don't use it or won't use it.

c-icap works with its antivirus module based on ClamAV, so we need to have libclamav installed on the system (ClamAV installed in the usual way is enough). If there is no libclamav in the system, c-icap will simply not be built.

Installing and configuring c-icap with ClamAV support

Unpack the c_icap-220505.tar.gz archive to /usr/src (or wherever you have the source codes). The configure script in the c-icap source directory should be run with the following options:

$ ./configure --enable-static --with-clamav --prefix=/usr/local/c_icap

Or, for example, like this, if --prefix=/opt/clamav to configure from ClamAV:

$ ./configure --enable-static --with-clamav=/opt/clamav --prefix=/usr/local/c_icap

The c_icap daemon is built statically. --prefix can also be specified to taste. You can also collect the demon itself:

You need to check if everything is assembled correctly:

$ make check

And directly install c-icap into the system (to the directory that was specified via --prefix):

# make install

Now we need to fix some settings in c-icap.conf. In the case of our --prefix=/usr/local/c_icap, it's not hard to guess that the configs are in /usr/local/c_icap/etc.

User is better to put nobody, since wwwrun, specified by default, is most likely not present in the system.
TmpDir /tmp is your temporary files directory.
Next, you need to configure ACL - Access Control Lists - a list of IP addresses that can use this ICAP daemon: acl localsquid_respmod src 127.0.0.1 type respmod acl localsquid src 127.0.0.1 acl externalnet src 0.0.0.0/0.0.0.0 icap_access allow localsquid_respmod icap_access allow localsquid icap_access deny externalnet
So we can determine from where access to our icap service is allowed, and from where it is not. Note that the ACL data does not specify a list of direct proxy clients, but rather a list of clients for the ICAP daemon, i.e. list of proxy servers (their IP addresses).

I have compiled an ACL for the case where the ICAP daemon and Squid are running on the same host.
- srv_clamav.ClamAvTmpDir /tmp - temporary directory for the ClamAV module.
- srv_clamav.VirSaveDir /var/infected/ - quarantine directory. Other similar better comment!
- srv_clamav.VirHTTPServer "DUMMY".
You can also try like this:
Srv_clamav.VirHTTPServer "http://proxy.your_srv_name.ru/cgi-bin/get_file.pl?usename=%f&remove=1&file="

Some clarification is needed: the srv_clamav.VirSaveDir option can be set multiple times, so that infected files will be saved to multiple locations. If one of the quarantine directories is set to the root of the web server, then users can be given the opportunity to deliberately download an infected file. It remains only to use the contrib/get_file.pl file in the c-icap source codes.

I didn't need it.

Create the /var/infected directory and make it owned by the nobody user (chown nobody /var/infected).

Let's test run c-icap:

# cd /usr/local/c_icap/bin # ./c-icap

If there are no error messages, then you should also make sure that c-icap is listening on the right socket:

# netstat -apn | grep 1344

If we see something similar to the following line, everything is in order:

Tcp 0 0 *:1344 *:* LISTEN 24302/c-icap

Let's leave the c-icap daemon running and move on to further settings.

Installing and configuring the Squid proxy server

Let's unpack the previously obtained Squid into /usr/src:

# tar zxvf squid-icap-2.5.STABLE11-20050927.tgz

Change to the Squid source directory and run configure like this:

$ ./configure --enable-icap-support

Before running configure in Dr.Web Squid, you need to run bootstrap.sh, located in the root directory of Squid source codes. If you are using Squid from Dr.Web, be sure to read the documentation from the drweb-icapd package!

Building Squid:

Install:

# make install

We have Squid installed in /usr/local/squid. Now let's change the settings in squid.conf.

You need to find a couple of lines:

#acl our_networks src 192.168.1.0/24 192.168.2.0/24 #http_access allow our_networks

Uncomment them and set your own value, instead of 192.168.1.0/24 192.168.2.0/24 (in my case, the proxy server users were on the 172.16.194.0/24 network):

Acl our_networks src 172.16.194.0/24 http_access allow our_networks

Go to /usr/local/squid/var, create a cache directory. Now run the command there:

# chown nobody cache/logs/

The change of ownership is necessary because the proxy daemon will be run as the nobody user and will not be able to write logs and use the cache.

It remains to create a directory structure for caching. Go to /usr/local/squid/sbin and run:

# ./squid -z

By default, the cache_dir parameter in squid.conf is set to:

Cache_dir ufs /usr/local/squid/var/cache 100 16 256

You can change the path to the cache (for example, if it is located on another partition or hard drive), and then you need to check the rights to the directory you specified.

At this stage, we have a working Squid, but without ICAP support, i.e. regular caching proxy server.

Let's add ICAP support...

Adding ICAP support to squid.conf

Find the word icap_enable and set the value icap_enable on. Find the word icap_preview_enable and set the value icap_preview_enable on. Find the word icap_preview_size and set the value icap_preview_size 128. Find the word icap_send_client_ip and set the value icap_send_client_ip on. Search for the word icap_service and add a couple of these icap services:

icap_service service_avi_req reqmod_precache 0 icap://localhost:1344/srv_clamav icap_service service_avi respmod_precache 1 icap://localhost:1344/srv_clamav

Search for icap_class and add the following icap class:

Icap_class class_antivirus service_avi service_avi_req

Search for icap_access and add the following permissions:

Icap_access class_antivirus allow all

In total, the following lines should be added to squid.conf to support ICAP:

icap_enable on icap_preview_enable on icap_preview_size 128 icap_send_client_ip on
icap_service service_avi_req reqmod_precache 0 icap://localhost:1344/srv_clamav icap_service service_avi respmod_precache 1 icap://localhost:1344/srv_clamav
icap_class class_antivirus service_avi service_avi_req icap_access class_antivirus allow all

This completes the minimal configuration of the proxy server.

Let's run it:

# cd /usr/local/squid/sbin # ./squid

If everything is correct, then there should be no messages in the console.

Health check

Add a proxy server in your browser (if proxying is not transparent) and open the page http://www.eicar.com/anti_virus_test_file.htm .

Try downloading the eicar.com file. If you see a similar message: "A VIRUS FOUND ..." - then everything is working correctly.

Please note that the cache of the proxy server must not contain infected objects! Therefore, it is better to clear the cache before using Squid with c-icap. Also note that the browser has its own cache.

Updating anti-virus databases ClamAV

Add freshclam to crontab. c-icap databases are reinitialized every srv_clamav.VirUpdateTime minutes - this parameter can be specified in c-icap.conf (by default, 15 minutes).

c-icap.magic file and types of checked objects

This file can be found in the same directory as c-icap.conf. It is a description of the formats of various groups of file types (TEXT, DATA, EXECUTABLE, ARCHIVE, GRAPHICS, STREAM, DOCUMENT - certain groups in c-icap.magic by default). Anti-virus scanning is based on the types of files passing through the proxy server. Some types, for example, you can exclude or add your own types.

The format of the line entry, to determine the file by its magic number (sequence):

Offset:Magic:Type:Group:Desc

Offset - the offset at which the Magic Sequence starts. Type and Group - the type and group to which the file with this magic sequence should be assigned. Desc - a short description, does not carry a technical load.

See c-icap.magic for an example.

Also note that the srv_clamav.ScanFileTypes parameter in c-icap.conf defines the groups and types of files (both groups and types can be specified) that should be scanned. What srv_clamav.VirScanFileTypes defines, I have not completely understood, but I suspect that the file groups that are forced to be scanned (EXECUTABLE and ARCHIVE by default).

In my c-icap config, the above options look like this:

srv_clamav.ScanFileTypes TEXT DATA EXECUTABLE ARCHIVE GRAPHICS STREAM DOCUMENT srv_clamav.VirScanFileTypes EXECUTABLE ARCHIVE

Possible problems

Squid gives ICAP protocol error message, no pages open. Check if you specified the ACL correctly in c-icap.conf, this ACL should allow access not for users, but for the proxy server.
Try terminating the Squid and c-icap processes and then starting them in the following order: c-icap first, then Squid.

Also, this error can occur if the c-icap daemon does not have enough permissions to write to the quarantine directory or log files.

If the problem still persists, try starting Squid with the -d 10 -N -X options:
# ./squid -d 10 -N -X And c-icap with options -N -d 10 -D: # ./c-icap -N -d 10 -D See detailed information that can help you figure out what and where not this way.
Squid only gives ICAP protocol error on some pages (on the same pages).
Check if c-icap has permission to write to the quarantine directory (or better still, make the user under which c-icap is running as the owner of all quarantine directories).

Try running c-icap and Squid in debug mode (see above for how to do this).

It's also a good idea to look at the c-icap logs.

Try loading the object that is causing the error again. Perhaps you will learn a lot more about the problem and be able to solve it.

Results

Now web surfing is also protected from viruses and other malicious codes (including some exploits for MS IE). As a corporate solution for a server with a heavy load, this method has not been tested, but I think it can be implemented (if only because the load can be distributed over several ICAP servers). As a solution for a small organization - quite relevant.

And remember what the developers write on their website:

>The Antivirus ClamAV service
>This service is under development.

You can learn about some principles of the ICAP protocol in Russian from the DrWeb-ICAP manual - one of the successful commercial implementations of the ICAP protocol. You can also read RFC 3507.

Comfortable and safe work!

Thank you for your attention.

Definition - What does it mean?

Internet Content Adaptation Protocol (ICAP) is a lightweight protocol providing simple object-based content vectoring for HTTP services. ICAP is used to extend transparent proxy servers. This frees up resources and standardizes the implementation of new features. It uses a cache to proxy all client transactions and process the transactions using ICAP Web servers, which are designed for specific functions such as virus scanning, content translation, content filtering or ad insertion.

ICAP performs content manipulation as a value added service for the appropriate client HTTP request or HTTP response. Thus the name "content adaptation."

This term is also known as Internet Content Adaption Protocol.

techopedia explains Internet Content Adaptation Protocol (ICAP)

Internet Content Adaptation Protocol was proposed in 1999 by Danzig and Schuster of Network Appliance. Don Gillies enhanced the protocol in 2000 allowing pipelined ICAP servers. All three encapsulations permitted by HTTP 1.1 are supported. He also produced training materials for vendors about 2005.

ICAP leverages caches and proxies to aid in producing value-added services. The value-added services can be off-loaded from Web servers to ICAP servers. Then, Web servers can be scaled using raw HTTP throughput.

Despite the similarity, ICAP is not HTTP. And it is not an application running over HTTP.

At present, content filtering cannot be singled out as a separate area of computer security, as it is so intertwined with other areas. In computer security, content filtering is very important, because it allows you to isolate potentially dangerous things and process them correctly. Approaches that have emerged in the development of content filtering products are being used in products to prevent intrusions (IDS), spread of malicious code and other negative actions.

Based on new technologies and products in the field of content filtering, additional services are created for users, the quality of protection is improved and it is possible not only to handle existing threats, but also to prevent entire classes of new threats.

New trends in content filtering

One of the general trends in the development of information security products is the desire to implement various functions in one device or software solution. As a rule, developers try to implement solutions that, in addition to content filtering, also perform the functions of an antivirus, a firewall, and/or an intrusion detection and prevention system. On the one hand, this allows companies to reduce the costs of purchasing and maintaining security systems, but on the other hand, the functionality of such systems is often limited. For example, in many products, Web traffic filtering functions are limited to checking site addresses against some database of site categories.

The development of products in accordance with the concept of Unified Threat Management ( UTM) that provides a unified approach to threat prevention, no matter which protocol or data is processed.

This approach avoids duplication of protection functions, as well as ensuring the relevance of data describing threats for all controlled resources.

In the areas of content filtering that have existed for quite a long time - control of mail and Internet traffic - changes are also taking place, new technologies are emerging.

Anti-phishing features have come to the fore in mail exchange control products. And in products for monitoring Internet traffic, there is a shift from using pre-prepared databases of addresses to categorization by content, which is a very important task when working with a variety of portal solutions.

In addition to the two areas mentioned above, there are also new areas of application of content filtering - some time ago, products began to appear to control the transfer of instant messages (instant messaging) and peer-to-peer (p2p) connections. Currently, products for monitoring VoIP traffic are also being actively developed.

Many countries have actively begun to develop tools for intercepting and analyzing many types of information that is used for various types of investigations (lawful interception). These activities are carried out at the state level and are most often tied to the investigation of terrorist threats. Such systems intercept and analyze not only data transmitted via Internet channels, but also via other types of communication - telephone lines, radio channels, etc. The most famous system for intercepting information is Echelon, a system used by US intelligence to collect information. In Russia, there are also various implementations of the system of operational-search measures (SORM), which are used to capture and analyze information in the interests of special services.

As one of the trends in the market of content filtering products, one can note the mass consolidation of companies producing such solutions. Although this trend reflects the organizational side of the process to a greater extent, it can lead to the emergence of new products and directions for companies that did not have these directions, or they occupied an insignificant part of the market sector of such companies. The following cases of mergers/acquisitions of companies can serve as an illustration of the above:

Secure Computing, which last year bought Cyberguard, which has a good set of Internet traffic filtering tools, merged with another company, CipherTrust, which has extensive experience in developing email traffic filtering tools, in the summer;
MailFrontier, which produced tools for protecting mail traffic, was absorbed by SonicWall, which had not previously had solutions with such a quality of development;
at the end of July 2006, SurfControl, known for its content filtering solutions, bought BlackSpider, which provided advanced computer security services;
At the end of August 2006, the most grandiose takeover took place - Internet Security Systems (ISS) signed a merger agreement with IBM. This merger is an example of the great interest in information security on the part of large software companies;
In January 2007, Cisco acquired IronPort, which has a strong line of email security products;
Microsoft has acquired several information security companies over the past few years. The largest of these was the takeover of Sybari, with its line of antivirus and other malware protection, as well as e-mail and instant message content filtering. Acquisition of Sybari and other companies allows Microsoft to successfully compete in the new market for computer security.

It is also worth noting that in recent years open source products for content filtering have begun to appear. In most cases, they do not achieve the same functionality as commercial applications, however, there are specific solutions and applications where they can pose a real threat.

Modern Threats

Modern IT infrastructure is subject to many attacks, targeting both ordinary users and companies, regardless of their size. The most relevant are the following types of threats:

Phishing— methods of intercepting important user data (passwords, credit card numbers, etc.) that have recently become widespread using social engineering techniques, when a user is forced to enter certain data on a site controlled by an attacker with a false letter or message from an organization;
Spyware & Malware- various means that allow you to intercept data or establish control over a computer. There are many varieties of such tools, which vary in the degree of danger to the computer - from simply displaying advertising messages to intercepting data entered by users and seizing control over computer operations;
viruses and other malicious code— Viruses, worms and Trojans are a long-known threat to IT infrastructure. But every year new modifications of malicious code appear, which often exploit vulnerabilities in existing software, which allows them to spread automatically;
SPAM/SPIM- Unsolicited messages transmitted via e-mail (SPAM) or instant messaging (SPIM) cause users to waste their time processing unwanted correspondence. Currently, SPAM accounts for more than 70% of all email messages transmitted;
infrastructure attacks- The IT infrastructure of companies is very important, attacks to disable it are extremely dangerous. For them, entire networks of computers infected with some kind of virus used to intercept control can be involved. For example, some time ago a virus was distributed that contained a code that was supposed to launch a distributed attack on Microsoft websites at a certain time in order to disable them. Several million computers were infected, and only a bug in the virus code did not allow the planned attack to be carried out;
business information leak— prevention of such leaks is one of the main tasks of content filtering products. The leakage of important information can cause irreparable damage to a company, sometimes comparable to the loss of fixed assets. Therefore, in many products, tools are being developed to determine the channels of covert data transmission, such as the use of steganography;
threat of prosecution- This type of threat is extremely relevant for companies if their employees can use file-sharing networks, downloading and / or distributing music, films and other copyrighted content. Litigation is also possible for the dissemination of libelous and/or defamatory information concerning third parties.

The first five types of threats affect both home computers and computers in corporate networks. But the last two threats are especially relevant for companies of all kinds.

Web traffic filtering

Recently, various changes have been taking place in the field of Internet traffic filtering, due to the emergence of new filtering technologies and changes in the technologies that are used to build Internet sites.

One of the most important trends in the development of content filtering products in terms of Internet traffic control is the transition from using databases of site categories to determining the category of a site by its content. This has become especially relevant with the development of various portals, which may contain content of different categories that changes over time and / or adjusts to the client's settings.

Recently popular technologies and tools for building Internet sites, such as Ajax, Macromedia Flash and others, require changes in Internet traffic filtering technologies.

The use of encrypted channels for interacting with Internet sites ensures the protection of data from interception by third parties, but at the same time, important information can be leaked through these data transmission channels or malicious code can enter computer systems.

The problem of integrating security tools with systems that ensure the functioning of the IT infrastructure, such as proxy servers, web servers, mail servers, directory servers, etc., remains relevant. Various companies and non-profit organizations are developing protocols for interaction between different systems.

The current state of affairs in this area will be discussed below.

Approaches to the categorization of sites and data

use of predefined bases of site categories with regular updating of lists of sites and categories;
categorization of data on the fly by analyzing the content of pages;
use of data about the category, information about belonging to which is provided by the site itself.

Each of these methods has its own advantages and disadvantages.

Predefined site category databases

Using pre-prepared databases of site addresses and related categories is a long-used and well-established method. Currently, such bases are provided by many companies, such as Websense, Surfcontrol, ISS / Cobion, Secure Computing, Astaro AG, NetStar and others. Some companies use these bases only in their products, others allow them to be connected to third-party products. The databases provided by Websense, Secure Computing, SurfControl and ISS/Cobion are considered the most complete, they contain information about millions of sites in different languages and countries, which is especially important in the era of globalization.

Data categorization and the formation of category databases are usually carried out in a semi-automatic mode - first, content analysis and category determination are performed using specially developed tools, which may even include text recognition systems in pictures. And in the second stage, the information received is often checked by people who decide which category a particular site can be classified into.

Many companies automatically replenish the category database based on the results of work with clients if a site is found that has not yet been assigned to any of the categories.

There are currently two ways to connect predefined site category databases:

using a local database of categories with regular updates. This method is very convenient for large organizations that have dedicated filtering servers and serve a large number of requests;
using a category database hosted on a remote server. This method is often used in various devices - small firewalls, ADSL modems, etc. Using a remote category database slightly increases the load on the channels, but ensures that the current category database is used.

The advantages of using predefined category databases include the fact that access is granted or denied at the stage of issuing a request by the client, which can significantly reduce the load on data transmission channels. And the main drawback of using this approach is the delay in updating the site category databases, since the analysis will take some time. In addition, some sites change their content quite often, due to which the information about the category stored in the address database becomes irrelevant. Some sites may also provide access to different information, depending on the user's name, geographic region, time of day, and so on.

Categorize data on the fly

One of the simple options for implementing such a solution is the use of Bayesian algorithms, which have proven themselves quite well in the fight against spam. However, this option has its drawbacks - it is necessary to periodically finish learning it, adjust the dictionaries in accordance with the transmitted data. Therefore, some companies use more complex algorithms for determining the category of a site by content in addition to simple methods. For example, ContentWatch provides a special library that analyzes data according to linguistic information about a particular language and, based on this information, can determine the category of data.

Categorizing data on the fly allows you to quickly respond to the emergence of new sites, since information about the category of the site does not depend on its address, but only on the content. But this approach also has disadvantages - it is necessary to analyze all transmitted data, which causes some decrease in system performance. The second drawback is the need to maintain up-to-date category databases for various languages. However, some products take this approach while using site category databases at the same time. These include the use of the Virtual Control Agent in SurfControl products, the mechanisms for determining data categories in the Dozor-Jet SKVT.

Category data provided by sites

In addition to address databases and on-the-fly content categorization, there is another approach to determining the category of sites - the site itself reports which category it belongs to.

This approach is primarily intended for use by home users where, for example, parents or teachers can set a filtering policy and/or keep track of which sites are visited.

There are several ways to implement this approach to resource categorization:

PICS (Platform for Internet Content Selection) is a specification developed by the W3 consortium about ten years ago and has various extensions aimed at ensuring the reliability of the rating system. For control, special developed software can be used, available for download from the project page. More information about PICS can be found on the W3.org website (http://www.w3.org/PICS/).
ICRA (Internet Content Rating Association) is a new initiative developed by an independent non-profit organization with the same name. The main goal of this initiative is to protect children from access to prohibited content. This organization has agreements with many companies (major telecommunications and software companies) to provide more reliable protection.
ICRA provides software that allows you to check the special label returned by the site and decide on access to this data. The software only runs on the Microsoft Windows platform, but due to the open specification it is possible to create filter software implementations for other platforms. The goals and objectives of this organization, as well as all the necessary documents can be found on the ICRA website - http://www.icra.org/.

The advantages of this approach include the fact that only special software is needed for data processing and there is no need to update the databases of addresses and / or categories, since all information is transmitted by the site itself. But the downside is that the site may indicate the wrong category, and this will lead to incorrect provision or prohibition of access to data. However, this problem can be solved (and is already being solved) through the use of means of data validation, such as digital signatures, etc.

Traffic Filtering in the Web 2.0 World

The massive introduction of so-called Web 2.0 technologies has greatly complicated the content filtering of web traffic. Since in many cases the data is transmitted separately from the design, there is a possibility of passing unwanted information to or from the user. In the case of working with sites that use such technologies, it is necessary to conduct a comprehensive analysis of the transmitted data, determining the transfer of additional information and taking into account the data collected in the previous stages.

Currently, none of the companies that produce web traffic content filtering tools allows for complex analysis of data transmitted using AJAX technologies.

Integration with external systems

In many cases, the issue of integrating content analysis systems with other systems becomes quite acute. At the same time, content analysis systems can act as both clients and servers, or in both roles at once. For these purposes, several standard protocols have been developed - Internet Content Adaptation Protocol (ICAP), Open Pluggable Edge Services (OPES). In addition, some manufacturers have created their own protocols to allow specific products to communicate with each other or with third-party software. These include the Cisco Web Cache Coordination Protocol (WCCP), Check Point Content Vectoring Protocol (CVP), and others.

Some protocols - ICAP and OPES - are designed in such a way that they can be used to implement both content filtering services and other services - translators, advertising placement, data delivery, depending on the distribution policy, etc.

ICAP protocol

Currently, the ICAP protocol is popular among content filtering software authors and software developers for detecting malicious content (viruses, spyware/malware). However, it is worth noting that ICAP was primarily designed to work with HTTP, which imposes many restrictions on its use with other protocols.

ICAP is adopted by the Internet Engineering Task Force (IETF) as a standard. The protocol itself is defined in "RFC 3507" with some additions outlined in the "ICAP Extensions draft". These documents and additional information are available from the ICAP Forum server - http://www.i-cap.org.

The system architecture when using the ICAP protocol is shown in the figure above. The ICAP client is the system through which the traffic is transmitted. The system that performs data analysis and processing is called the ICAP server. ICAP servers can act as clients to other servers, allowing multiple services to dock to share the same data.

The communication between the client and the server uses a protocol similar to the HTTP version 1.1 protocol, and the same ways of encoding information. According to the ICAP standard, it can process both outgoing (REQMOD - Request Modification) and incoming (RESPMOD - Response Modification) traffic.

It is up to the ICAP client to decide which of the transmitted data will be processed, in some cases this makes it impossible to fully analyze the data. Client settings are entirely implementation dependent and in many cases cannot be changed.

After receiving data from the client, the ICAP server performs data processing and, if necessary, data modification. The data is then returned to the ICAP client, and it passes it on to the server or client, depending on which direction it was sent.

ICAP is most widely used in anti-malware products because it allows these checks to be used across products and is independent of the platform on which the ICAP client is running.

The disadvantages of using ICAP include the following:

additional network interactions between the client and the server somewhat slow down the speed of data transfer between external systems and information consumers;
there are checks that need to be performed not on the client, but on the ICAP server, such as determining the data type, etc. This is relevant because in many cases ICAP clients rely on the file extension or data type reported by the external server, which can lead to security policy violations;
difficult integration with systems using protocols other than HTTP prevents the use of ICAP for deep data analysis.

OPES Protocol

Unlike ICAP, the OPES protocol was developed taking into account the characteristics of specific protocols. In addition, when developing it, the shortcomings of the ICAP protocol were taken into account, such as the lack of authentication of clients and servers, the lack of authentication, etc.

Like ICAP, OPES has been adopted by the Internet Engineering Task Force as a standard. The service interaction structure, interaction protocol, service requirements, and service security solutions are set out in RFC 3752, 3835, 3836, 3837, and others. The list is regularly updated with new documents describing the application of OPES not only to the processing of Internet traffic, but also to the processing of mail traffic, and in the future, possibly other types of protocols.

The structure of interaction between OPES servers and clients (OPES Processor) is shown in the figure. In general terms, it is similar to the scheme of interaction between ICAP servers and clients, but there are also significant differences:

there are requirements for the implementation of OPES clients, which makes it possible to more conveniently manage them - setting filtering policies, etc.;
the data consumer (user or information system) can influence the processing of data. For example, when using automatic translators, the received data can be automatically translated into the language used by the user;
systems that provide data can also influence the results of processing;
processing servers can use for analysis data specific to the protocol by which the data was transmitted to the OPES client;
some data processing servers may receive more sensitive data if they are in a trust relationship with the OPES client, consumers and/or information providers.

All of the listed options depend solely on the configuration used when implementing the system. Due to these possibilities, the use of OPES is more promising and convenient than the use of the ICAP protocol.

In the near future, products that support OPES along with the ICAP protocol are expected to appear. But since there are currently no full-fledged implementations using OPES, it is impossible to draw final conclusions about the shortcomings of this approach, although theoretically there is only one drawback - the increase in processing time due to the interaction between OPES clients and servers.

HTTPS and other types of encrypted traffic

According to some analysts, up to 50% of Internet traffic is encrypted. The problem of controlling encrypted traffic is now relevant for many organizations, since users can use encryption to create information leakage channels. In addition, encrypted channels can also be used by malicious code to penetrate computer systems.

There are several tasks associated with processing encrypted traffic:

analysis of data transmitted over encrypted channels;
verification of certificates that are used by servers to organize encrypted channels.

The relevance of these tasks is increasing every day.

Encrypted data transmission control

Controlling the transmission of data sent over encrypted channels is probably the most important task for organizations whose employees have access to Internet resources. To implement this control, there is an approach called "Man-in-the-Middle" (also called "Main-in-the Middle" in some sources), which can be used by attackers to intercept data. The data processing scheme for this method is shown in the figure:

The data processing process is as follows:

a specially issued root certificate is installed in the user's Internet browser, which is used by the proxy server to sign the generated certificate (without installing such a certificate, the user's browser will display a message that the signing certificate was issued by an untrusted organization);
when a connection is established with a proxy server, data is exchanged, and a specially generated certificate with the data of the destination server, but signed with a known key, is transmitted to the browser, which allows the proxy server to decrypt the transmitted traffic;
decrypted data is parsed in the same way as normal HTTP traffic;
the proxy server establishes a connection with the server to which the data should be transferred, and uses the server's certificate to encrypt the channel;
the data returned from the server is decrypted, parsed and transmitted to the user, encrypted with the proxy server certificate.

When using this scheme for processing encrypted data, there may be problems associated with confirming the truth of the user. In addition, work is required to install the certificate in the Internet browsers of all users (if you do not install such a certificate, the user will receive a message that the certificate was signed by an unknown company, which will give the user information about monitoring data transfer).

The following products are now on the market to control the transmission of encrypted data: Webwasher SSL Scanner from Secure Computing, Breach View SSL, WebCleaner.

Certificate Authentication

The second problem that arises when using encrypted data transmission channels is the authentication of certificates provided by the servers that users work with.

Attackers can attack information systems by creating a false DNS entry that redirects user requests not to the site they need, but to the site created by the attackers themselves. These bogus sites can steal important user data such as credit card numbers, passwords, etc., and download malicious code under the guise of software updates.

To prevent such cases, there is specialized software that checks the compliance of certificates provided by the server with the data they report.

In the event of a mismatch, the system may block access to such sites or grant access after explicit confirmation by the user. In this case, data processing is performed in almost the same way as when analyzing data transmitted over encrypted channels, only in this case it is not the data that is analyzed, but the certificate provided by the server.

Mail traffic filtering

When using e-mail, organizations are faced with the need to provide protection for both incoming and outgoing traffic. But the tasks solved for each of the directions are quite different. For incoming traffic, it is necessary to ensure control of malicious code, phishing and spam (spam), while in outgoing mail content is controlled, the transmission of which can lead to the leakage of important information, the distribution of compromising materials, and the like.

Most of the products on the market provide control over incoming traffic only. This is done through integration with anti-virus systems, implementation of various anti-spam and anti-phishing mechanisms. Many of these functions are already built into email clients, but they cannot completely solve the problem.

There are currently several ways to protect users from spam:

comparison of received messages with the existing database of messages. When comparing, various techniques can be used, including the use of genetic algorithms that allow you to isolate keywords even if they are distorted;
dynamic categorization of messages by their content. Allows you to very effectively determine the presence of unwanted correspondence. To counter this method, spammers use images to send messages with text inside and/or sets of words from dictionaries, which create noise that interferes with the operation of these systems. However, already now, to combat such spam, various methods are being used, such as wavelet analysis and / or text recognition in images;
gray, white and black access lists allow you to describe the policy of receiving mail messages from known or unknown sites. The use of gray lists in many cases helps to prevent the transmission of unwanted messages due to the specifics of the software that sends spam. To maintain black lists of access, both local databases managed by the administrator and global databases replenished based on messages from users from all over the world can be used. However, the use of global databases is fraught with the fact that entire networks, including those containing "good" mail servers, can get into them.

To combat information leaks, a variety of methods are used, based on the interception and in-depth analysis of messages in accordance with a complex filtering policy. In this case, it becomes necessary to correctly determine file types, languages and text encodings, and conduct a semantic analysis of transmitted messages.

Another application of systems for filtering mail traffic is the creation of encrypted mail streams, when the system automatically signs or encrypts the message, and the data is automatically decrypted at the other end of the connection. This functionality is very convenient if you want to process all outgoing mail, but it must reach the recipient in encrypted form.

Instant Message Filtering

Instant messaging is slowly moving into the category of actively used tools in many companies. They provide quick interaction with employees and / or customers of organizations. Therefore, it is quite natural that the development of tools, which, among other things, may turn out to be a channel for information leakage, has led to the emergence of tools for controlling the transmitted information.

Currently, the most commonly used protocols for instant messaging are MSN (Microsoft Network), AIM (AOL Instant Messaging), Yahoo! Chat, Jabber and their corporate counterparts are Microsoft Live Communication Server (LCS), IBM SameTime and Yahoo Corporate Messaging Server protocols. On the territory of the CIS, the ICQ system, which is now owned by AOL and uses almost the same protocol as AIM, has become widespread. All of these systems do almost the same thing - they transmit messages (both through the server and directly) and files.

Now, almost all systems have the ability to make calls from a computer to a computer and / or to regular phones, which creates certain difficulties for control systems and requires VoIP support to implement full-fledged proxy servers.

Typically, IM traffic control products are implemented as an application gateway that parses the transmitted data and blocks the transmission of prohibited data. However, there are implementations in the form of specialized IM servers that perform the necessary checks at the server level.

The most requested features of products for monitoring IM traffic:

access control by separate protocols;
control of used clients, etc.;
individual user access control:
allowing the user to communicate only within the company;
allowing the user to communicate only with certain users outside the company;
control of transmitted texts;
file transfer control. The objects of control are:
- file size;
- file type and/or extension;
direction of data transfer;
control of the presence of malicious content;
definition of SPIM;
saving transmitted data for subsequent analysis.

Currently, instant messaging control allows you to run the following products:

CipherTrust IronIM by Secure Computing. This product supports AIM, MSN, Yahoo! Chat, Microsoft LCS and IBM SameTime. This is one of the most complete solutions right now;
Symantec's IM Manager (developed by IMLogic, which was acquired by Symantec). This product supports the following protocols - Microsoft LCS, AIM, MSN, IBM SameTime, ICQ, and Yahoo! chat;
Antigen for Instant Messaging from Microsoft also allows you to work with almost all popular protocols for instant messaging.

Other companies' products (ScanSafe, ContentKeeper) have fewer features than those listed above.

It is worth noting that two Russian companies, Grand Prix (a product of SL-ICQ) and Mera.ru (a product of Sormovich), provide products for monitoring the transmission of messages using the ICQ protocol.

VoIP filtering

The growing popularity of means for the transfer of audio information between computers (also called Voice over IP (VoIP)) makes it necessary to take measures to control the transfer of such information. There are different implementations for PC-to-PC and/or regular phone calls.

There are standardized protocols for exchanging such information, such as the Session Instantiation Protocol (SIP) adopted by the IETF and H.323 developed by the ITU. These protocols are open, which makes it possible to process them.

In addition, there are protocols developed by specific companies that do not have open documentation, which makes it very difficult to work with them. One of the most popular implementations is Skype, which has gained wide popularity around the world. This system allows you to make calls between computers, make calls to landlines and mobile phones, and receive calls from landlines and mobile phones. The latest versions support video sharing.

Most of the products currently available can be divided into two categories:

products that allow you to identify and block VoIP traffic;
products that can detect, capture and analyze VoIP traffic.

Dolphian products that allow you to detect and allow or deny VoIP traffic (SIP and Skype) that is encapsulated in standard HTTP traffic;
Verso Technologies products;
different types of firewalls that have this capability.

the product of the Russian company "Sormovich" supports the capture, analysis and storage of voice information, which is transmitted via the H.323 and SIP protocols;
the open source library Oreka () allows you to determine the signaling component of audio traffic and capture the transmitted data, which can then be analyzed by other means.

Recently it became known that the product developed by ERA IT Solutions AG allows you to intercept VoIP traffic transmitted using the Skype program. But to perform such control, you need to install a specialized client on the computer running Skype.

Peer-to-peer filtering

The use of various peer-to-peer (p2p) networks by employees poses the following threats to organizations:

distribution of malicious code;
information leak;
distribution of copyrighted data, which may lead to prosecution;
decrease in labor productivity;

There are a large number of peer-to-peer networks. There are networks that have central servers used to coordinate users, and there are networks that are completely decentralized. In the second case, they are especially difficult to control using standard tools such as firewalls.

To solve this problem, many companies create products that allow detecting and processing p2p traffic. There are the following solutions for processing p2p traffic:

SurfControl Instant Messaging Filter, which handles p2p on par with instant messaging;
the Websense Enterprise package also provides users with tools to control p2p traffic;
Webwasher Instant Message Filter allows you to control access to various p2p networks.

The use of these or other products not listed here dramatically reduces the risks associated with user access to p2p networks.

Unified Threat Management

Unified Threat Management solutions are offered by many security vendors. As a rule, they are built on the basis of firewalls, which, in addition to the main functions, also perform the functions of content filtering of data. Typically, these features focus on preventing intrusions, malicious code penetration, and spam.

Many of these products are implemented as hardware and software solutions that cannot completely replace mail and Internet traffic filtering solutions, since they work with only a limited number of capabilities provided by specific protocols. They are typically used to avoid duplication of functionality across products and to ensure that all application protocols are handled against the same known threat database.

The most popular Unified Threat Management solutions are the following products:

SonicWall Gateway Anti-Virus, Anti-Spyware and Intrusion Prevention Service provides anti-virus and other data protection for SMTP, POP3, IMAP, HTTP, FTP, NetBIOS protocols, Instant Messaging protocols and many streaming protocols used to transfer audio and video information ;
a series of devices ISS Proventia Network Multi-Function Security, made in the form of software and hardware systems, provides blocking of malicious code, unwanted messages and intrusions. The delivery includes a large number of checks (including those for VoIP), which can be extended by the user;
Secure Computing's Network Gateway Security hardware platform, in addition to protection against malicious code and spam, also has VPN support. Almost all Secure Computing solutions are integrated into this platform.

There are other products, but those listed above are widely used.

Interception of data

Data interception (Lawful interception) has almost always been used by intelligence agencies to collect and analyze transmitted information. However, recently the issue of data interception (not only Internet traffic, but also telephony and other types) has become very relevant in the light of the fight against terrorism. Even those states that have always been against such systems began to use them to control the transfer of information.

Since various types of data are intercepted, often transmitted over high-speed channels, the implementation of such systems requires specialized software for capturing and parsing data and separate software for analyzing the collected data. As such, software for content filtering of one or another protocol can be used.

Perhaps the most famous of these systems is the Anglo-American Echelon system, which has long been used to intercept data in the interests of various US and British agencies. In addition, the US National Security Agency uses the Narus system, which allows real-time monitoring and analysis of Internet traffic.

Among the Russian products we can mention solutions from the Sormovich company, which allows capturing and analyzing mail, audio, and various types of Internet traffic (HTTP and others).

Conclusion

The development of information systems leads to the emergence of more and more new threats. Therefore, the development of content filtering products not only does not lag behind, but sometimes even anticipates the emergence of new threats, reducing the risks for protected information systems.

Module start page

A service that allows clients to make indirect requests to other network services. First, the client connects to the proxy server and requests some web resource located on another server. The proxy server then either connects to the specified server and obtains the resource from it, or returns the resource from its own cache (if one of the clients has already accessed this resource). In some cases, a client request or server response can be modified by a proxy server for certain purposes.

Also, the proxy server allows you to analyze client HTTP requests passing through the server, perform filtering and accounting of traffic by URL and mime types. In addition, the proxy server implements a mechanism for accessing the Internet by login/password.

The proxy server performs caching of objects received by users from the Internet and thereby reduces traffic consumption and increases page loading speed.

When you enter the module, the status of the services, the "Disable" button (or "Enable" if the module is disabled) and the latest messages in the log are displayed.

Settings

Usually, to work through a proxy server, you must specify its address and port in the browser settings. However, if user login/password authorization is not used, then the transparent proxy function can be used.

In this case, all HTTP requests from the local network are automatically routed through a proxy server. Thus, it becomes possible to filter and account for traffic by URL, regardless of the settings of client computers.

The default port of the proxy server is 3128, in the module settings you can change it to any free port.

Authorization types

The ICS proxy server supports two authorization methods: by the user's ip-address, and by login-password.

Authorization by ip-address is suitable for cases where the user constantly uses the same computer. The proxy determines which user owns this or that traffic, based on the ip-address of his computer. This method is not suitable for terminal servers, since in this case several users work from one ip-address. Also, this method is not suitable for organizations in which users constantly move between jobs. In addition, the user can change the ip-address of his computer and, if the binding of the MAC address to IP is not configured, the ICS will take it for someone else.

Authorization by login/password solves the problem of binding users to their own computer. In this case, when accessing any Internet resource for the first time, the browser will prompt the user for a login/password to access the Internet. If users in your network are authorized in a domain, you can set the authorization type to Via Domain. In this case, if the ICS is connected to a domain controller and users have been imported from the domain, authorization will be performed transparently, without requiring a login/password.

In addition, you should remember that proxy authorization is used only for http traffic of users. Access to the Internet for programs using protocols other than http is regulated by a firewall, which has only one authorization method: by IP address. In other words, if a user uses only login/password authorization, he will not be able to use mail, jabber client, torrent client, and other programs that do not support working through an http proxy.

Web Authorization

In order to authorize users without a registered proxy server by username and password, you can use web authorization (captive portal) by enabling the corresponding checkbox. Web authorization allows, for example, to integrate an authorization page into a corporate portal and use it as an authorization page. By default, the web authorization port is 82, you can also change it to any free one.

In order not to manually register a proxy server on each client machine, you can use the autoconfigurator. The option "Automatic proxy configuration" must be set in the client's browser, all other settings will be determined by the ICS.

It is enabled by checking the box in the corresponding tab. You can check one or more of the available protocols (HTTP, HTTPS, FTP).

The option to publish the autoconfiguration script determines whether it will be available by the server's IP address or by the created virtual host with a domain name. When you select a virtual host, it will be automatically created in the system. Checkbox "Create an entry on the DNS server" will automatically add a zone with the necessary entries for this virtual host.

Publish autoconfiguration script over DHCP- this parameter sends proxy settings to all DHCP clients of the server.

Parent Proxy

If your organization has several proxy servers located hierarchically, then the proxy server upstream for the ICS will be its parent proxy. In addition, any network node can act as a parent proxy.

In order for the ICS to redirect requests coming to its proxy server to the parent proxy, specify its ip-address and destination port in the "Parent proxy" tab.

Proxy servers can exchange their cache data using the ICP protocol. In the case of network operation through several proxies, this can significantly speed up the work. If the parent proxy supports the protocol, check the corresponding box and specify the port for the service (3130 by default).

Issued ip-addresses

This tab contains a list of IP addresses and users who are authorized on the proxy server using web authorization.

Cache contents

The Log tab contains a summary of all system messages from the proxy server. The magazine is divided into pages, using the "forward" and "back" buttons you can go from page to page, or enter the page number in the field and switch immediately to it.

Log entries are highlighted in color depending on the type of message. Normal system messages are marked in white, system status messages (power on/off, cache processing) in green, errors in red.

There is a search bar in the top right corner of the module. With it, you can search the log for the entries you need.

The log always displays events for the current date. To view events on another day, select the desired date using the calendar in the upper left corner of the module.

Portal for the student. Self-training