Browsing the tag opensolaris
I just received my “Guide to Open-Source Operating Systems” comparing Solaris with Linux from Sun’s marketing department. Here are some of the facts that made me cringe due to blatant lying and half truths. Hey Sun, don’t let the facts get in your way.
Believe it or not but this is actually verbatim from the guide.
• Solaris is supported by more applications.
• Solaris holds performance and price/performance world records that demonstrate its speed and scalability on a variety of systems.
• Solaris is supported by Sun, the company dedicated to UNIX for more than two decades.
1. Lets see, the first fact is just blatant lying. Last I checked Linux supported IA-32, MIPS, x86-64, SPARC, DEC Alpha, Itanium, PowerPC, ARM, m68k, PA-RISC, s390, SuperH, M32R and many more platforms. While Solaris only supports SPARC, IA-32 and x86-64. Does anyone at Sun’s marketing department care to fact check?
2. Depends on your definition of “supported.” Marketing is most likely referring to commercial support. I don’t have the facts to back this up but I doubt this is hold true with Linux in 2009, maybe they had a case back in 1999. Majority of open source applications are developed against Linux and Solaris compatibility is just an after thought.
3. You win http://www.tpc.org/tpcc/results/tpcc_perf_results.asp
Sun develops some of the best hardware and software on the market, but their marketing department is a disaster. There can only be one Steve Jobs and his reality distortion field.
Once again I have been blind sided by yet another conservative out-of-the-box setting. IPFilter is tuned way too conservative with it’s state table size.
Here is how you can tell if your hitting any issues, run ipfstat and check for lost packets.
victori@opensolaris:~# ipfstat | grep lost fragment state(in): kept 0 lost 0 not fragmented 0 fragment state(out): kept 0 lost 0 not fragmented 0 packet state(in): kept 798 lost 100 packet state(out): kept 612 lost 234
Notice that the in and out lost state lines have a non-zero value. This means IPFilter has been dropping client connections, bummer.
The default settings are quite conservative.
fr_statemax min 0×1 max 0×7fffffff current 4096
fr_statesize min 0×1 max 0×7fffffff current 5002
You need to shutdown IPFilter and apply larger table size limits.
victori@opensolaris:~# /usr/sbin/ipf -T fr_statemax=18963,fr_statesize=27091
Lets confirm that it works.
fr_statemax min 0×1 max 0×7fffffff current 18963
fr_statesize min 0×1 max 0×7fffffff current 27091
Awesome, now all we need to do is enable IPfilter and no more lost packets.
To make this persistent across reboots edit ipf.conf
name=”ipf” parent=”pseudo” instance=0 fr_statemax=18963 fr_statesize=27091;
Then update the contents
This can be applied to any OS that uses IPFilter.
Update: The following information could be beneficial to some, however my issues actually were with Caviar black drives shipping with TLER disabled. You need to pay Western Digital a premium for their “RAID” drives with TLER enabled. So for anyone reading this, avoid consumer Western Digital drives if you plan on using them for RAID.
zfs_vdev_max_pending
I can’t believe how long I have been tolerating horrible concurrent IO performance on OpenSolaris running ZFS. When I have any IO intensive writes happening the whole system slows down to a crawl for any further IO. Running “ls” on a uncached directory is just painful.
victori@opensolaris:/opt# iostat -xnz 1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 87.0 0.0 2878.1 0.0 0.0 0.0 0.4 0 100 c4t0d0 0.0 83.0 0.0 2878.1 0.0 0.1 0.2 0.7 1 50 c4t1d0 1.0 0.0 28.0 0.0 0.0 0.0 0.0 5.4 0 1 c4t2d0
Notice c4t0d0 is blocking at 100%. If a disk is blocking at 100% good luck getting the disk to do any other operations such as reads.
SATA disks do Native Command Queuing while SAS disks do Tagged Command Queuing, this is an important distinction. Seems like OpenSolaris/Solaris is optimized for the latter with a 32 wide command queue set by default. This completely saturates the SATA disks with IO commands in turn making the system unusable for short periods of time.
Dynamically set the ZFS command queue to 1 to optimize for NCQ.
And add to /etc/system
Enjoy your OpenSolaris server on cheap SATA disks!
Recently a primary boot disk went bad on our server and I got blind sided by a non-bootable secondary mirror disk. All the data was intact but I could not boot it. This required a slow re-installation and migration process that took a very long time.
• ZFS attach automatically partitions the drive as EFI.
• ZFS send/recv transfers on gzip compressed data-slices is slow.
Here is the correct way of getting both the disks in the ZFS mirror to boot.
Plug the new drive into the server that you want to add to the ZFS mirror. If your hot swapping or adding a new drive while the server is still on, you need to use cfgadm to configure it.
Now that the drive is configured and seen by the server you need to repartition it with format so it can be used as a bootable device.
AVAILABLE DISK SELECTIONS:
0. c4t0d0
/pci@0,0/pci8086,346c@1f,2/disk@0,0
1. c4t1d0
/pci@0,0/pci8086,346c@1f,2/disk@1,0
2. c4t2d0
/pci@0,0/pci8086,346c@1f,2/disk@2,0
* select your new drive *
# fdisk
* use fdisk to remove the EFI partition and add a solaris2 partition. *
Select the partition type to create:
1=SOLARIS2 2=UNIX 3=PCIXOS 4=Other
5=DOS12 6=DOS16 7=DOSEXT 8=DOSBIG
9=DOS16LBA A=x86 Boot B=Diagnostic C=FAT32
D=FAT32LBA E=DOSEXTLBA F=EFI 0=Exit?
This step is very important, if you did not repartition your drive, zfs attach will default the drive back to an EFI partition table that is not bootable.
c4t0d0s2 — primary drive.
c4t1d0s2 — new drive that we are setting up.
You should now be able to attach the secondary drive to your mirror using the identical slice.
Once the mirror is done synchronizing you need to install the bootloader on the drive.
Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
stage1 written to master boot sector
Trouble Shooting
raw device must be a root slice (not s2)
You did not re-partition the drive to a solaris2 partition. EFI partitions can’t be made bootable. Use the format tool to reconfigure the drive with a solaris2 partition.
cannot open/stat device /dev/rdsk/c1t0d0s0
You did not copy your label information from your primary to your secondary disk with prtvtoc and fmthard.
For those interested in how we run Fabulously40.
1. Single server, OpenSolaris / 8Gigs RAM / Quad Xeon x5355 / 100Mbit line.
2. Static and dynamic data cached up front on varnish
3. Even though Nginx can handle L7 load balancing, Perlbal offers better flexibility with its plugin system
4. Jetty application servers easily scale out by using memcached as the session store
5. Write intensive operations are done asynchronously via the ActiveMQ message store system
6. One PostgreSQL database on RAID1 with a hot standby database on a third disk.
The application can do 6,000+req/sec and 80-120req/sec without the varnish cache. The platform uses Wicket, Hibernate and Spring for it’s internals.
There you have it.

OpenSolaris uses a single-threaded malloc by default for all applications. The JDK that is compiled for Solaris fails to be linked against mtmalloc or the newer umem malloc implementation that is multithread optimized. In a multithreaded application using a single threaded malloc can degrade performance. As memory is being allocated concurrently in multiple threads, all the threads must wait in a queue while malloc() handles one request at a time, this is called heap contention. To get around this contention point you can force the JDK to use the umem malloc.
LD_PRELOAD=/usr/lib/libumem.so /opt/jdk1.7.0/bin/java start.jar or LD_PRELOAD=/usr/lib/libmtmalloc.so /opt/jdk1.7.0/bin/java start.jar
This simple fix has really improved performance on our web service fabulously40. The application went from serving 120req/sec uncached to 170req/sec. Not bad no?
This also works wonders for mysql and varnish, two applications that really put those threads to use. We have dropped 100ms in response time with varnish by just using umem for the malloc implementation.


(4 votes, average: 3.50 out of 5)