Thursday 21 August 2008

C++: pointers can't be changed in functions

Can update what the pointer points to but not the pointer.

#include <iostream>
    int t1;
    int t2;

void upPoint(int * s1)
{
    printf("upPoint: p1 before %p, %d\n", s1, *s1);
    s1 = &t2;
    printf("upPoint: p1 after %p, %d\n", s1, *s1);
}

int main()
{
   t1 = 1;
   t2 = 2;
   int * p1 = &t1;

   printf("p1 before %p, %d\n", p1, *p1);
   upPoint(p1);
   printf("p1 after %p, %d\n", p1, *p1);

}

g++ -Wall -o pointUpdate pointUpdate.cc

[tohare@masrtcsdev01 test]$ ./pointUpdate
p1 before 0x8049b60, 1
upPoint: p1 before 0x8049b60, 1
upPoint: p1 after 0x8049b64, 2
p1 after 0x8049b60, 1

Sunday 17 August 2008

Qt4 - basic build process

I'm using ubuntu and have installed everything to do with QT4 via Synapic package manager.
Now copy the files off this page:
http://doc.trolltech.com/4.3/widgets-tetrix.html
Run qmake -project.
It will create a tetrix.pro file.
Now run qmake, make and it will make it.
Easy Peasy.

Wednesday 13 August 2008

LMBench - benchmarking tool

LMbench - Tools for Performance Analysis

What is LMbench?

Suite of simple, portable benchmarks

Compares different

systems performance

Results available for most major vendors (SUN, HP, IBM, DEC, SGI, PCs including 200 Mhz P6's)

Free software, covered by the GNU General Public License.

Bandwidth benchmarks

Cached file read
Memory copy (bcopy)
Memory read
Memory write
Pipe
TCP

Latency benchmarks

Context switching.
Networking: connection establishment, pipe, TCP, UDP, and RPC hot potato
File system creates and deletes.
Process creation.
Signal handling
System call overhead
Memory read latency

Miscellanious

Processor clock rate calculation

Book - The Practice of Programming

http://www.informit.com/store/product.aspx?isbn=020161586X

The Practice of Programming

by Brian W. Kernighan and Rob Pike.

Addison-Wesley, Inc., 1999.
ISBN 0-201-61586-X.
267 + xii pp. $24.95

From the Preface

From Chapter 5, Debugging

Source code from the book

Errata

A war story that didn't appear in the book

Clearsilver HTML templates in C

What is Clearsilver?

Clearsilver is a fast, powerful, and language-neutral HTML template system. In both static content sites and dynamic HTML applications, it provides a separation between presentation code and application logic which makes working with your project easier.
The design of Clearsilver began in 1999, and evolved during its use at onelist.com, egroups.com, and Yahoo! Groups. Today many other projects and websites using it.

Why use Clearsilver?

High Performance and Language Neutral. Because Clearsilver is written as a C-library, and exported to scripting languages like Python, Perl, Java and Ruby via modules, it is extremely fast. This also means you can work with the same template system independent of the language your project is in.
Pluggable Look and Feel. Clearsilver makes it easy to face lift a site by providing a new set of templates. It is possible to easily run more than one look and feel at once, and share components with a base look and feel to reduce maintenance.
Internationalization Support. Clearsilver makes it trivial to support internationlization. You write your templates in your native language and included tools automatically extract and catalog language strings for translation.
Advanced features. Built in support for advanced features such as gzip compression, smart whitespace stripping, parametric macros, online debugging mode, url and javascript string escaping, and more.

How can I learn more?

The Clearsilver documentation explains both the theory of operation for Clearsilver itself, the C-api, and the extension module APIs.

Template Basics

The rendering process for Neotonic ClearSilver has two pieces: the dataset and the ClearSilver (CS) template files themselves. The dataset is a hierarchical namespace of strings which statements in the CS template reference. The CS template contains HTML and CS statements intermixed. Together they define the output of a given page.

Monday 11 August 2008

Linux write cache

From:

Greg Smith's Note Magnet

A Linux write cache mystery

One happy coincidence for me last month is that I discovered a friend of mine had built a Linux-based server running PostgreSQL and put it into a colo, something I had been pricing out myself. He created me an account and now I've got a place to host some personal MediaWiki projects I'd been planning. One known issue my friend mentioned is that he'd been running into regular problems where the whole server just froze up for a few seconds. Because of the pattern of when it happened, he suspected it was related to heavy writes, and furthermore suspected the software RAID-1 implementation. Since that seemed unlikely to me, I told him to hold off on breaking the RAID until I could take a look at things.

The server is a quad-core system with 8GB of RAM and a pair of SATA disks in software RAID-1. The OS is CentOS 5.2, based on the RHEL5 release, and it's using kernel 2.6.18 (the regular one, not the Xen one).

I started by doing the same read/write testing I always do on a system whose I/O I don't necessarily trust or understand: that procedure is outlined at Testing your disks for PostgreSQL. Since the server has 8GB of RAM I used 2,000,000 blocks. I opened a couple of sessions to the server, executed that in one window, top in a second, and "vmstat 1" in a third. Watching a single second vmstat is one of the most useful things you can do for figuring out where bottlenecks are at on a system.

In this case, what quickly became apparent is that the system was alternating between healthy periods that looked like this:


procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
2  5   5196  46428  28160 6188640    0    0     0 539720 1261  673  0 28 25 47  0
 0  6   5196  47268  28212 6047744    0    0     0 988400 1235  615  0 21 25 54  0
0  6   5196  46980  28272 6047216    0    0     0 64032 1260  658  0  6 13 81  0
1  5   5196  45884  28336 6046788    0    0     0 61568 1281  677  0  7  0 93  0
 1  5   5196  47276  28400 6043408    0    0     0 65632 1267  665  0  7  0 93  0
0  6   5196  46272  28460 6044080    0    0     0 65568 1272  641  0  6  0 94  0
0  6   5196  48188  28524 6042420    0    0     0 65536 1271  626  0  6  0 94  0
 0  6   5196  46228  28592 6044836    0    0     0 66928 1298  668  0  7  0 94  0
0  5   5196  46648  28652 6044812    0    0     0 61504 1264  648  0  6  8 86  0

The bo (block out) number is the number to watch on this write test. That's in KB/s, so the entries in the later section here are all approximately 65MB/s. But at the beginning, it's writing in the Linux disk cache at a really high speed, as much as 988MB/s at the beginning. Note that these numbers are total I/O, which includes both of the disks in the RAID-1 pair. That means the actual per-disk write rate is closer to 32MB/s, a bit on the low side, but that's presumably because the disks are already backlogged with writes from the initial burst.

That's a healthy period. Here's what the unhealthy ones looked like:


procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  4   3780 935592  32048 5205528    0    0     0     0 1270  371  0  4 25 71  0
 0  4   3780 945140  32048 5205528    0    0     0     0 1293  383  0  3 25 72  0
0  4   3780 954316  32048 5205528    0    0     0     0 1271  370  0  3 25 72  0
0  4   3780 963616  32048 5205528    0    0     0     0 1293  385  0  3 25 72  0
 1  4   3780 973288  32048 5205528    0    0     0     0 1293  372  0  3 25 72  0
0  4   3780 982464  32048 5205528    0    0     0     0 1280  395  0  3 25 72  0
0  4   3780 992384  32048 5205528    0    0     0     0 1283  373  0  4 25 71  0
 0  4   3780 1002180  32048 5205528    0    0     0     0 1320  380  0  3 25 72  0
0  4   3780 1011480  32048 5205528    0    0     0     0 1280  371  0  3 25 72  0
0  4   3780 1021028  32048 5205528    0    0     0     0 1315  389  0  4 25 71  0
 0  4   3780 1030204  32048 5205528    0    0     0     0 1280  372  0  3 25 72  0
0  4   3780 1039132  32048 5205528    0    0     0     0 1279  383  0  3 25 72  0
0  4   3780 1049052  32048 5205528    0    0     0     0 1273  369  0  3 25 72  0
 0  4   3780 1058600  32048 5205528    0    0     0     0 1285  388  0  3 25 72  0
0  4   3780 1067900  32048 5205528    0    0     0     0 1259  373  0  3 25 72  0
0  4   3780 1077696  32048 5205528    0    0     0     0 1293  388  0  4 25 71  0
 0  4   3780 1087740  32048 5205528    0    0     0     0 1267  371  0  3 25 72  0
0  4   3780 1096296  32048 5205528    0    0     0     0 1262  385  0  3 25 72  0
0  4   3780 1106216  32048 5205528    0    0     0     0 1274  367  0  3 25 72  0
 1  3   3780 898384  32268 5428364    0    0     0     8 1344  489  1 19 25 55  0
2  4   3780 313476  32816 5991000    0    0     0     0 1248  374  0 28 25 47  0
1  5   5196  47324  27828 6253400    0 1928     0  1928 1289  470  0 19 17 64  0

That's over 20 seconds straight where zero blocks were written. That certainly seems to match the reported problem behavior of a long unresponsive period, and sure enough some of the sessions I had open were less responsive while this was going on. The question, then, is why it's happening? The disks seem to be working well enough; here's the summary at the end of the dd (the version of dd included in RHEL5 now provides this for you):


16384000000 bytes (16 GB) copied, 209.117 seconds, 78.3 MB/s

78MB/s to each disk in the pair is completely reasonable.

I wrote a long paper on how Linux handles heavy writes called The Linux Page Cache and pdflush because I never found a source that really covered what happens in this situation. What I recommend there is watching /proc/meminfo to see what's going on. Here's a little shell bit you can execute to do that:


while [ 1 ]; do cat /proc/meminfo; sleep 1; done

With some practice you can note what numbers are constantly moving, catch when the bad behavior occurs, then hit control-C to break and peruse the last few entries in your terminal app scrollback buffer. Here's what I found right around the same time as the lull periods:


MemTotal:      8174540 kB
MemFree:         62076 kB
 Buffers:         21724 kB
Cached:        6158912 kB
SwapCached:          0 kB
Active:        1126936 kB
Inactive:      6101688 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8174540 kB
 LowFree:         62076 kB
SwapTotal:    16771840 kB
SwapFree:     16766644 kB
Dirty:            6640 kB
Writeback:     3230792 kB ***
AnonPages:     1047724 kB
Mapped:          65736 kB
Slab:           823308 kB
 PageTables:      23036 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  20859108 kB
Committed_AS:  1893448 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    271572 kB
VmallocChunk: 34359465955 kB
 HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Note the line I starred there for Writeback. At the point where the system was stalling, a full 3.2GB of data was queued to write. That's 40% of RAM. Going back to my Linux page cache paper, you'll find that number listed: 40% is the point where Linux switches to the high dirty_ratio behavior, where all processes are blocked for writes. On a fast server with this much RAM, you can fill gigabytes of RAM in seconds, but writing that out to disk is still going to take a long time. If we work out the math, 3.2GB to write to two disks capable of 78MB/s each works out to...20.5 seconds. Well look at that, almost exactly the length of our long slow period, where process writes were stalled waiting for the kernel to clean up. I love it when the math comes together.

So, what to do? Well, this problem (way too many writes buffered on systems with large amounts of RAM) was so obvious that in the 2.6.22 Linux kernel, the defaults for the caching here were all lowered substantially. This is from the release notes to 2.6.22:

Change default dirty-writeback limits. This means the kernel will write "dirty" caches differently...dirty_background_ratio defaulted to 10, now defaults to 5. dirty_ratio defaulted to 40, now it's 10

A check of this server showed it was using the 2.6.18 defaults as expected:


[gsmith@server ~]$ cat /proc/sys/vm/dirty_ratio
40
[gsmith@server ~]$ cat /proc/sys/vm/dirty_background_ratio
10

So what I suggested to my friend the server admin was to change these to the new values that are now standard in later kernels. It's easy to put these lines into /etc/rc.d/rc.local to make this change permanent after trying it out:


echo 10 > /proc/sys/vm/dirty_ratio
echo 5 > /proc/sys/vm/dirty_background_ratio

After doing that, I re-ran the dd test and things were much better. There were a few seconds where there was a small blip in throughput. During the 4 minute long test I found one 4-second long period writes dropped to the 2MB/s level. But for the most part, the giant bursts followed by lulls were gone, replaced by a fairly steady 130MB/s of writing the whole time. The final dd numbers looked like this after the retuning:


16384000000 bytes (16 GB) copied, 261.112 seconds, 62.7 MB/s

So this did drop average and peak throughput a bit. That's usually how things worst: best throughput to disks usually involves writing in larger bursts, which is efficient but very disruptive. But that's a small price to pay for making the many second long pauses go away.

This particular problem shows up in all kinds of places where heavy writing is being done. Tuning these parameters is also one of the suggestions I make for people running PostgreSQL 8.2 or earlier in particular who want to tune checkpoint behavior better. In that context, there have even been reports of people turning this particular write cache off altogether, which again would lower average throughput, but in that case it was worth it for how much it decreased worst-case behavior.

Time will tell if there's anything else going on that was contributing to the pauses originally reported that is still happening on this server, but this looks like a pretty clear smoking gun that's now been holstered.

g++ error: multiple declarations

I have this error with g++ version 3.4.6
yms.cc:1317: error: multiple declarations `NullWiException' and `e'

And it points to this line:
} catch (NullWiException e) {
I fixed it by changing it to a reference:
} catch (NullWiException &e) {

Tuesday 5 August 2008

reed solomon codes and file descriptors

Hardening your files with Reed-Solomon codes

The idea behind this process is error correcting codes, like for example the ubiquitous Reed-Solomon. With Reed-Solomon, parity bytes are used to protect a block of data from a specified maximum number of errors per block. In the tools described below, a block of 223 bytes is shielded with 32 bytes of parity. The original 223 bytes are then morphed into 255 "shielded" ones, and can be recovered even if 16 bytes from inside the "shielded" block turn to noise...

Secure File Descriptor Handling

During the 2.6.27 merge window a number of my patches were merge and now we are at the point where we can securely create file descriptors without the danger of possibly leaking information. Before I go into the details let's get some background information.

A file descriptor in the Unix/POSIX world has lots of state associated with it. One bit of information determines whether the file descriptor is automatically closed when the process executes an exec call to start executing another program. This is useful, for instance, to establish pipelines. Traditionally, when a file descriptor is created (e.g., with the default open() mode) this close-on-exec flag is not set and a programmer has to explicitly set it using

   fcntl(fd, F_SETFD, FD_CLOEXEC);

Closing the descriptor is a good idea for two main reasons:

the new program's file descriptor table might fill up. For every open file descriptor resources are consumed.
more importantly, information might be leaked to the second program. That program might get access to information it normally wouldn't have access to.

It is easy to see why the latter point is such a problem. Assume this common scenario:

A web browser has two windows or tabs open, both loading a new page (maybe triggered through Javascript). One connection is to your bank, the other some random Internet site. The latter contains some random object which must be handled by a plug-in. The plug-in could be an external program processing some scripting language. The external program will be started through a fork() and exec sequence, inheriting all the file descriptors open and not marked with close-on-exec from the web browser process.

Infinite Lifetimes