March 21, 2008

URL filtering and redirection with squid proxy server

A friend of mine who is working at one high school asked me to help him with url filtering for student's PCs. Many times students are just chatting or looking at some nasty web pages which he wanted to block.

The school is connected to the Internet through a Linux server which acts as a router with NAT. For historical reasons there was also a squid proxy running in transparent mode on the server. This made the solution simpler. Without a proxy I would probably started to play with l7-filters and iptables.

Squid offers different methods to filter urls. You can customize the squid.conf file where the configuration is stored and create ACLs to block some urls. Or you can use external applications which will redirect the URL based on different settings.

First I tried the external redirector. SquidGuard is one option but it sounded like a hammer to a fly. So I refreshed my perl coding skills and created a perl redirector script. The script allows to store the blocked URLs list in a file, as well as the list of source addresses which have full Internet access.

redirector script:

$ cat /usr/local/bin/squid_blocker.pl
#!/usr/bin/perl -w

$db_block="/usr/local/lib/squid_blocker.list";
$db_white="/usr/local/lib/squid_blocker_white.list";

$|=1;
while (<>) {
        my @X = split;
        my $url = $X[0];
        my $src = $X[1];

        open(DAT, $db_white) || die($url);
        @white_data=;
        close(DAT);

        $found=0;
        foreach $wip (@white_data) {
                $wip =~ s/\s+$//;
                if ($src =~ m/$wip/) {
                        print "$url\n";
                        $found=1;
                        last;
                }
        }
        if ($found == 0) {
                open(DAT, $db_block) || die($url);
                @blocked_data=;
                close(DAT);
                foreach $burl (@blocked_data) {
                        $burl =~ s/\s+$//;
                        if ($url =~ m/$burl/i) {
                                print "302:http://www.jozjan.net\n";
                                $found=1;
                                last;
                        }
                }
                if ($found == 0) {
                        print "$url\n";
                }
        }

blacklist file:

$ cat /usr/local/lib/squid_blocker.list
http(s?):\/\/[^\/]*pokec\.sk

whitelist file:

$ cat /usr/local/lib/squid_blocker_white.list
10\.0\.0\.1

squid configuration for redirector:

$ grep squid_blocker /etc/squid/squid.conf
redirect_program /usr/local/bin/squid_blocker.pl

Unfortunately this solution turned out to be slow. It took for a standard web page 3 times more time to load when the redirector was used. And for some weird reasons, some parts of web pages didn't load at all :-( The advantages of this solution were that after updating the blocked urls and allowed source IPs files it was not necessary to restart the proxy server and the blocked URLs were redirected to an another website. Next time I will try to write the redirector in C. It should has better performance then :)

I moved to the first mentioned option - ACL in squid.conf. You can define an ACL which contents will be stored in an external file. So you don't have to write the blocked URLs directly into the squid.conf file.

squid configuration with ACL:

$ grep black /etc/squid/squid.conf
acl blacksites url_regex "/etc/squid/blacksites"
acl blacksites_wip src "/etc/squid/blacksites_wip"
http_access deny blacksites !blacksites_wip

blacklist file:

$ cat /etc/squid/blacksites
[^/]*pokec.sk

whitelist file:

$ cat /etc/squid/blacksites_wip
10.0.0.1
10.0.0.51

This solution works with the best performance, but after each update in the blacklist or the whitelist file it is necessary to restart the squid proxy server.