/etc/sysctl.conf
Almost all kernel parameters, were defined as constants. However using sysctl you can modify these constants to fit to your needs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# check all existing sysctl parameters $ sysctl -a # change them temporary $ sudo sysctl -w kern.maxfilesperproc=200000 $ sudo sysctl -w kern.maxfiles=200000 $ sudo sysctl -w net.inet.ip.portrange.first=1024 # change them permanently $ vim /etc/sysctl.conf # edit ... # load $ sysctl -p /etc/sysctl.conf |
Networking Sysctl Tweaks (edit /etc/sysctl.conf)
In networking, there are 5 layers:
- Application Layer – Web Browser/ OS (Segment)
- Transport Layer – TCP (Segments)
- Reliable protocol – Whatever you send, the receiver must ack that it has got it.
- If not getting ack, resend
- Network Layer – IP (Packets)
- Data Link Layer – hardware part is involved like MAC address etc (Frames)
- Physical Layer – data flow thru the wire
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# Increase system file descriptor limit (this is system level) fs.file-max = 100000 # Discourage Linux from swapping idle processes to disk (default = 60) vm.swappiness = 10 # if you have so many IO connections, increase ports net.ipv4.ip_local_port_range = 10240 65535 # Increase Linux autotuning TCP buffer limits. Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE # increase TCP max buffer size net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limits (min, default, max # of bytes to use) net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.optmem_max = 40960 # increase read/write TCP buffer to allow for larger window sizes. This enables more data to be # transferred without ACKs and in turn increasing the throughput net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 # Make room for more TIME_WAIT sockets due to more clients, and allow them to be reused if we run out of sockets # Also increase the max packet backlog net.core.netdev_max_backlog = 50000 # increase the length of the processor input queue net.ipv4.tcp_max_syn_backlog = 30000 net.ipv4.tcp_max_tw_buckets = 2000000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_fin_timeout = 10 # Disable TCP slow start on idle connections net.ipv4.tcp_slow_start_after_idle = 0 # If your servers talk UDP, also up these limits net.ipv4.udp_rmem_min = 8192 net.ipv4.udp_wmem_min = 8192 net.core.somaxconn = 1000 # recommended default congestion control is htcp net.ipv4.tcp_congestion_control=htcp # recommended for hosts with jumbo frames enabled net.ipv4.tcp_mtu_probing=1 # Disable source routing and redirects net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.all.accept_redirects = 0 net.ipv4.conf.all.accept_source_route = 0 # Log packets with impossible addresses for security net.ipv4.conf.all.log_martians = 1 |
- Increase max open files to 100,000 from the default (typically 1024). In Linux, every open network socket requires a file descriptor. Increasing this limit will ensure that lingering TIME_WAIT sockets and other consumers of file descriptors don’t impact our ability to handle lots of concurrent requests.
- Decrease the time that sockets stay in the TIME_WAIT state by lowering tcp_fin_timeout from its default of 60 seconds to 10. You can lower this even further, but too low, and you can run into socket close errors in networks with lots of jitter. We will also set tcp_tw_reuse to tell the kernel it can reuse sockets in the TIME_WAIT state.
- We won’t tune the total TCP memory (tcp_mem), since this is automatically tuned based on available memory by Linux.
- NOTE: Since some of these settings can be cached by networking services, it’s best to reboot to apply them properly (sysctl -p does not work reliably).
Increase TCP throughput via increasing the size of the interface queue.
1 2 |
ifconfig eth0 txqueuelen 1000 |
Shell Limits
An application could be run as regular user on the host system. If so, you may need to give different limits to this user.
/etc/security/limits.conf (File Descriptors and Max # of processes)
1 2 3 4 5 6 7 8 9 10 |
# for just the user "nobody" nobody soft nofile 4096 nobody hard nofile 63536 nobody soft nproc 2047 nobody hard nproc 16384 # for every users * soft nofile 100000 * hard nofile 100000 |
- Don’t set the hard limit in FD same as /proc/sys/fs/file-max. As this user could eat up all system FDs, then the entire system will run out of the FDs.
/etc/pam.d/sshd
It needs to load the modified limits.conf
1 2 3 4 5 6 |
# ensure pam includes our limits session required pam_limits.so # confirm it via running $ ulimit -n |
TCP Congestion Window
Finally, let’s increase the TCP congestion window from 1 to 10 segments. This is done on the interface, which makes it a more manual process than our sysctl settings. First, use ip route to find the default route, shown in bold below:
1 2 3 4 |
$ ip route default via 10.248.77.193 dev eth0 proto kernel 10.248.77.192/26 dev eth0 proto kernel scope link src 10.248.77.212 |
Copy that line, and paste it back to the ip route change command, adding initcwnd 10 to the end to increase the congestion window:
1 2 |
$ sudo ip route change default via 10.248.77.193 dev eth0 proto kernel initcwnd 10 |
To make this persistent across reboots, you’ll need to add a few lines of bash like the following to a startup script somewhere. Often the easiest candidate is just pasting these lines into /etc/rc.local:
1 2 3 |
defrt=`ip route | grep "^default" | head -1` ip route change $defrt initcwnd 10 |
Once you’re done with all these changes, you’ll need to either bundle a new machine image, or integrate these changes into a system management package such as Chef or Puppet.
Virtual Memory Tweak
Swap file
discussed above
Page Cache
Under Linux, the Page Cache accelerates many accesses to files on non volatile storage. This happens because, when it first reads from or writes to data media like hard drives, Linux also stores data in unused areas of memory, which acts as a cache. If this data is read again later, it can be quickly read from this cache in memory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# check memory status, the main memory currently used for the page cache is indicated in the "cached" column $ free -m total used free shared buffers cached Mem: 15976 15195 781 0 167 9153 -/+ buffers/cache: 5874 10102 Swap: 2000 0 1999 # Writing to disk will first write to page cache (indicated as dirty) # then periodically transfer to underlying storage device or you can have system call sync or fsync to flush it. $ dd if=/dev/zero of=testfile.txt bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0,0121043 s, 866 MB/s $ cat /proc/meminfo | grep Dirty Dirty: 10260 kB $ sync $ cat /proc/meminfo | grep Dirty Dirty: 0 kB |
vm.dirty_ratio (default=20)
Percentage of total available memory that contains free and reclaimable pages at which a process that is generating disk writes will start writing out dirty data.
1 2 3 |
##Add this line## vm.dirty_ratio = 80 |
vm.dirty_background_ratio (default=10)
This value determines the percentage of memory that can contain dirty pages before the background kernel flusher threads start to write dirty pages to disk. If you have 1GB of RAM and you set this to 10 then it would take 100MB of dirty pages to begin the flush process.
1 2 3 |
##Add this line## vm.dirty_background_ratio = 5 |
vm.dirty_expire_centisecs (default=3000)
Value is expressed in 1/100 of a second. It defines the age at which dirty pages are eligible to be written to disk by the kernel flusher threads. This means that the longer this value is the higher the odds of data loss but also more time in memory if the program needs to use it again.
1 2 3 |
##Add this line## vm.dirty_expire_centisecs = 12000 |
File System Tweaks
1 2 3 4 |
vim /etc/rc.local ##Add this line## echo noop > /sys/block/sda/queue/scheduler |
Make sure that /etc/rc.local is executable, otherwise the changes will not get applied on reboot, a simple chmod +x /etc/rc.local should do the trick.
Connect with us