![]() |
Six Things First-Time Squid Administrators Should Knowby Duane Wessels, author of Squid: The Definitive Guide02/12/2004 |
New users often struggle with the same frustrating set of Squid idiosyncracies. In this article, I'll detail six things you should know about using Squid from the get-go. Even if you're an experienced Squid administrator, you might want to look at these tips and give your configuration file a sanity check, especially the one about preventing spam.
File descriptor limits are a common problem for new Squid users. This happens because some operating systems have relatively low per-process and system-wide limits. In some cases, you must take steps to tune your system before compiling Squid.
A file descriptor is simply a number that represents an open file or socket. Every time a process opens a new file or socket, it allocates a new file descriptor. These descriptors are reused after the file or socket is closed. Most Unix systems place a limit on the number of simultaneously open file descriptors. There are both per-process and per-system limits.
How many file descriptors does Squid need? The answer depends on how many users you have, the size of your cache, and which particular features that you have enabled. Here are some of the things that consume file descriptors in Squid:
Related Reading ![]() |
Even when Squid is not doing anything, it has some number of file descriptors open for log files and helpers. In most cases, this is between 10 and 25, so it's probably not a big deal. If you have a lot of external helpers, that number goes up. However, the file descriptor count really goes up once Squid starts serving requests. In the worst case, each concurrent request requires three file descriptors: the client-side connection, a server-side connection for cache misses, and a disk file for reading hits or writing misses.
A Squid cache with just a few users might be able to get by with a file descriptor limit of 256. For a moderately busy Squid, 1024 is a better limit. Very busy caches should use 4096 or more. One thing to keep in mind is that file descriptor usage often surges above the normal level for brief amounts of time. This can happen during short, temporary network outages or other interruptions in service.
There are a number of ways to determine the file descriptor limit on your
system. One is to use the built-in shell commands limit
or
ulimit
.
For Bourne shell users:
root# ulimit -n
1024
For C shell users:
root# limit desc
descriptors 1024
If you already have Squid compiled and installed, you can just look at the cache.log file for a line like this:
2003/12/12 11:10:54| With 1024 file descriptors available
If Squid detects a file descriptor shortage while it is running, you'll see a warning like this in cache.log:
WARNING! Your cache is running out of file descriptors
If you see the warning, or know in advance that you'll need more file descriptors, you should increase the limits. The technique for increasing the file descriptor limit varies between operating systems.
Linux users need to edit one of the system include files and twiddle one of
the system parameters via the /proc
interface. First, edit
/usr/include/bits/types.h and change the value for
__FD_SETSIZE
. Then, give the kernel a new limit with this
command:
root# echo 1024 > /proc/sys/fs/file-max
Finally, before compiling or running Squid, execute this shell command to set the process limit equal to the kernel limit:
root# ulimit -Hn 1024
After you have set the limit in this manner, you'll need to reconfigure, recompile, and reinstall Squid. Also note that these two commands do not permanently set the limit. They must be executed each time your system boots. You'll want to add them to your system startup scripts.
On BSD-based systems, you'll need to compile a new kernel. The kernel configuration file lives in a directory such as /usr/src/sys/i386/conf or /usr/src/sys/arch/i386/conf. There you'll find a file, possibly named GENERIC, to which you should add a line like this:
options MAXFILES=8192
For OpenBSD, use option
instead of options
. Reboot
your system after you've finished configuring, compiling, and installing your
new kernel. Then, reconfigure, recompile, and reinstall Squid.
Add this line to your /etc/system file:
set rlim_fd_max = 1024
Then, reboot the system, reconfigure, recompile, and reinstall Squid.
For further information on file descriptor limits, see Chapter 3, "Compiling and Installing", of Squid: The Definitive Guide or section 11.4 of the Squid FAQ.
Directory permissions are another problem that first-time users often
encounter. One of the reasons for this difficulty is that, in the interest of
security, Squid refuses to run as root. Furthermore, if you do start Squid as
root, it switches to a default user ("nobody") that has no special privileges.
If you don't want to use the "nobody" userid, you can set your own with the
cache_effective_user
directive in the configuration file.
Certain files and directories must be writable by the Squid userid. These include the log files, usually found in /usr/local/squid/var/logs, and the cache directories, /usr/local/squid/var/cache by default.
As an example, let's assume that you're using the "nobody" userid for Squid.
After running make install
, you can use this command to set the
permissions for the log files and cache:
root# chown -R nobody /usr/local/squid/var/logs
root# chown -R nobody /usr/local/squid/var/cache
Then, you can proceed to initialize the cache directories with this command:
root# /usr/local/squid/sbin/squid -z
Helper processes are another source of potential permission problems. Squid
spawns the helper processes as the unprivileged user (that is, as "nobody").
This usually means that the helper program must have read and execute
permissions for everyone (for example, -rwxr-xr-x
). Furthermore,
any configuration or password files that the helper needs must have appropriate
read permissions as well.
Note that Unix also requires correct permissions on parent directories
leading to a file. For example, if /usr/local/squid is owned by root with
-rwxr-x---
permissions, the user nobody will not be able to access
any of the directories underneath it. /usr/local/squid should be
"-rwxr-xr-x
" instead.
You may want to debug file or directory permission problems from a shell window. If Squid runs as nobody, then start a shell process as user nobody:
root# su - nobody
(You may have to temporarily change "nobody"'s home directory and shell program for this to work.) Then, try to read, write, or execute the files that are giving you trouble. For example:
nobody$ cd /usr
nobody$ cd local
nobody$ cd squid
nobody$ cd var
nobody$ cd logs
nobody$ touch cache.log
Squid tends to be a bit of a memory hog. It uses memory for many different things, some of which are easier to control than others. Memory usage is important because if the Squid process size exceeds your system's RAM capacity, some chunks of the process must be temporarily swapped to disk. Swapping can also happen if you have other memory-hungry applications running on the same system. Swapping causes Squid's performance to degrade very quickly.
An easy way to monitor Squid's memory usage is with standard system tools
such as top
and ps
. You can also ask Squid itself how
much memory it is using, through either the cache manager or SNMP interfaces. If
the process size becomes too large, you'll want to take steps to reduce it. A
good rule of thumb is to not let Squid's process size exceed 60% to 80% of your
RAM capacity.
One of the most important uses for memory is the main cache index. This is a
hash table that contains a small amount of metadata for each object in the
cache. Unfortunately, all of these "small" data structures add up to a lot when
Squid contains millions of objects. The only way to control the size of the
in-memory index is to change Squid's disk cache size (with the
cache_dir
directive). Thus, if you have plenty of disk space, but
are short on RAM, you may have to leave the disk space underutilized.
Squid's in-memory cache can also use significant amounts of RAM. This is
where Squid stores incoming and recently retrieved objects. Its size is
controlled by setting the cache_mem
directive. Note that the
cache_mem
directive only affects the size of the memory cache, not
Squid's entire memory footprint.
Squid also uses some memory for various I/O buffers. For example, each time a
client makes an HTTP request to Squid, a number of memory buffers are allocated
and then later freed. Squid uses similar buffers when forwarding requests to
origin servers, and when reading and writing disk files. Depending on the amount
and type of traffic coming to Squid, these I/O buffers may require a lot of
memory. There's not much you can do to control memory usage for these purposes.
However, you can try changing the TCP receive buffer size with the
tcp_recv_bufsize
directive.
If you have a large number of clients accessing Squid, you may find that the
"client DB" consumes more memory than you would like. It keeps a small number of
counters for each client IP address that sends requests to Squid. You can reduce
Squid's memory usage a little by disabling this feature. Simply put
client_db off
in squid.conf.
Another thing that can help is to simply restart Squid periodically, say, once per week. Over time, something may happen (such as a network outage) that causes Squid to temporarily allocate a large amount of memory. Even though Squid may not be using that memory, it may still be attached to the Squid process. Restarting Squid allows your operating system to truly free up the memory for other uses.
You can use Squid's high_memory_warning
directive to warn you
when its memory size exceeds a certain limit. For example, add a line like this
to squid.conf:
high_memory_warning 400 MB
Then, if the process grows beyond that value, Squid writes warnings to cache.log and syslog if configured.
Squid writes to various log and journal files as it runs. These files will continually increase in size unless you take steps to "rotate" them. Rotation refers to the process of closing a log file, renaming it, and opening a new log file. It's similar to the way that most systems deal with their syslog files, such as /var/log/messages.
If you don't rotate the log files, they may eventually consume all free space on that partition. Some operating systems, such as Linux, cannot support files larger than 2Gb. When this happens, you'll get a "File too large" error message and Squid will complain and restart.
To avoid such problems, create a cron
job that periodically
rotates the log files. It can be as simple as this:
0 0 * * * /usr/local/squid/sbin/squid -k rotate
In most cases, daily log file rotation is the most appropriate. A not-so-busy cache can get by with weekly or monthly rotation.
Squid appends numeric suffixes to rotated log files. Each time you run
squid -k rotate
, each file's numeric suffix is incremented by one.
Thus, cache.log.0 becomes cache.log.1, cache.log.1
becomes cache.log.2, and so on. The logfile_rotate
directive specifies the maximum number of old files to keep around.
Logfile rotation affects more than just the log files in /usr/local/squid/var/logs. It also generates new swap.state files for each cache directory. However, Squid does not keep old copies of the swap.state files. It simply writes a new file from the in-memory index and forgets about the old one.
Squid has an extensive, but somewhat confusing, set of access controls. The most important thing to understand is the difference between ACL types, elements, and rules, and how they work together to allow or deny access.
Squid has about 20 different ACL types. These refer to certain aspects of an
HTTP request or response, such as the client's IP address (the src
type), the origin server's hostname (the dstdomain
type), and the
HTTP request method (the method
type).
An ACL element consists of three components: a type, a name, and one or more type-specific values. Here are some simple examples:
acl Foo src 1.2.3.4
acl Bar dstdomain www.cnn.com
acl Baz method GET
The above ACL element named Foo
would match a request that comes
from the IP address 1.2.3.4. The ACL named Bar
matches a
www.cnn.com URL. The Baz
ACL matches an HTTP GET
request. Note that we are not allowing or denying anything yet.
For most of the ACL types, an element can have multiple values, like this:
acl Argle src 1.1.1.8 1.1.1.28 1.1.1.88
acl Bargle dstdomain www.nbc.com www.abc.com www.cbs.com
acl Fraggle method PUT POST
A multi-valued ACL matches a request when any one of the values is a match.
They use OR
logic. The Argle
ACL matches a request
from 1.1.1.8, from 1.1.1.28, or from 1.1.1.88. The Bargle
ACL
matches requests to NBC, ABC, or CBS web sites. The Fraggle ACL matches a
request with the methods PUT
or POST
.
Now that you're an expert in ACL elements, its time to graduate to ACL rules.
These are where you say that a request is allowed or denied. Access list rules
refer to ACL elements by their names and contain either the allow
or deny
keyword. Here are some simple examples:
http_access allow Foo
http_access deny Bar
http_access allow Baz
It is important to understand that access list rules are checked in order and
that the decision is made when a match is found. Given the above list, let's see
what happens when a user from 1.2.3.4 makes a GET
request for
www.cnn.com. Squid encounters the allow Foo
rule first. Our request
matches the Foo
ACL, because the source address is 1.2.3.4, and the
request is allowed to proceed. The remaining rules are not checked.
How about a PUT
request for www.cnn.com from 5.5.5.5? The
request does not match the first rule. It does match the second rule, however.
This access list rule says that the request must be denied, so the user receives
an error message from Squid.
How about a GET
request for www.oreilly.com from 5.5.5.5? The
request does not match the first rule (allow Foo
). It does not
match the second rule, either, because www.oreilly.com is different than
www.cnn.com. However, it does match the third rule, because the request method
is GET
.
Of course, these simple ACL rules are not very interesting. The real power
comes from Squid's ability to combine multiple elements on a single rule. When a
rule contains multiple elements, each element must be a match in order to
trigger the rule. In other words, Squid uses AND
logic for access
list rules. Consider this example:
http_access allow Foo Bar
http_access deny Foo
The first rule says that a request from 1.2.3.4 AND
for
www.cnn.com will be allowed. However, the second rule says that any other
request from 1.2.3.4 will be denied. These two lines restrict the user at
1.2.3.4 to visiting only the www.cnn.com site. Here's an even more complex
example:
http_access deny Argle Bargle Fraggle
http_access allow Argle Bargle
http_access deny Argle
These three lines allow the Argle
clients (1.1.1.8, 1.1.1.28,
and 1.1.1.88) to access the Bargle
servers (www.nbc.com,
www.abc.com, and www.cbs.com), but not with PUT
or
POST
methods. Furthermore, the Argle
clients are not
allowed to access any other servers.
One of the common mistakes often made by new users is to write a rule that
can never be true. It is easy to do if you forget that Squid uses
AND
logic on rules and OR
logic on elements. Here is a
configuration that can never be true:
acl A 1.1.1.1
acl B 2.2.2.2
http_access allow A B
The reason is that a request cannot be from both 1.1.1.1 AND
2.2.2.2 at the same time. Most likely, it should be written like this:
acl A 1.1.1.1 2.2.2.2
http_access allow A
Then, requests from either 1.1.1.1 or 2.2.2.2 are allowed.
Access control rules can become long and complicated. When adding a new rule, how do you know where it should go? You should put more-specific rules before less-specific ones. Remember that the rules are checked in order. When adding a rule, go through the current rules in your head and see where the new one fits. For example, let's say that you want to deny requests to a certain site, but allow all others. It should look like this:
acl XXX www.badsite.net
acl All src 0/0
http_access deny XXX
http_access allow All
Now, what if you need to make an exception for one user, so that she can visit that site? The new ACL element is:
acl Admin 3.3.3.3
and the new rule should be:
http_access allow Admin XXX
but where does it go? Since this rule is more specific than the deny
XXX
rule, it should go first:
http_access allow Admin XXX
http_access deny XXX
http_access allow All
If we place the new rule after deny XXX
, it will never even get
checked. The first rule will always match the request and she will not be able
to visit the site.
When you first install Squid, the access control rules will deny every request. To get things working, you'll need to add an ACL element and a rule for your local network. The easiest way is to write an source IP address ACL element for your subnet(s). For example:
acl MyNetwork src 192.168.0.0/24
Then, search through squid.conf for this line:
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
After that line, add an http_access
line with an allow rule:
http_access allow MyNetwork
Once you get this simple configuration working, feel free to move on to some of the more advanced ACL features, such as username-based proxy authentication.
Unless you've been living under a rock, you're aware of the spam problem on the Internet. Spam senders used to take advantage of open email relays. These days, a lot of spam comes from open proxies. An open proxy is one that allows outsiders to make requests through it. If others on the Internet receive spam email from your proxy, your IP address will be placed on one or more of the various blackhole lists. This will adversely affect your ability to communicate with other Internet sites.
Use the following access control rules to make sure this never happens to you. First, always deny all requests that don't come from your local network. Define an ACL element for your subnet:
acl MyNetwork src 10.0.0.0/16
Then, place a deny rule near the top of your http_access
rules
that matches requests from anywhere else:
http_access deny !MyNetwork
http_access ...
http_access ...
While that may stop outsiders, it may not be good enough. It won't stop insiders who intentionally, or unintentionally, try to forward spam through Squid. To add even more security, you should make sure that Squid never connects to another server's SMTP port:
acl SMTP_port port 25
http_access deny SMTP_port
In fact, there are many well-known TCP ports, in addition to SMTP, to which
Squid should never connect. The default squid.conf includes some rules
to address this. There, you'll see a Safe_ports
ACL element that
defines good ports. A deny !Safe_ports
rule ensures that Squid does
not connect to any of the bad ports, including SMTP.
Duane Wessels discovered Unix and the Internet as an undergraduate student studying physics at Washington State University.
O'Reilly & Associates published Squid: The Definitive Guide in January 2004.
Chapter 8, "Advanced Disk Cache Topics," is available free online.
You can also look at the Table of Contents, the Index, and the full description of the book.
For more information, or to order the book, click here.
Return to ONLamp.com.
Copyright © 2004 O'Reilly Media, Inc.