At Crittercism we use chef to easily deploy and manage our servers. We started noticing very poor performance (high CPU usage) on our chef machine — luckily we found this article on Hacker News:
dmesg showed the following output:
[2245059.916474] Clock: inserting leap second 23:59:60 UTC [2247718.768385] Clocksource tsc unstable (delta = 1099511625387 ns)
The bug itself is in the Linux kernel, but Java applications are particularly vulnerable. Chef is primarily Ruby based, but does have one Java component — “Chef Solr” a light wrapper around Apache Solr. Its function is to allow quick search of any chef metadata (such as the hundreds of nodes we have!)
The article points out a quick fix, which sure enough worked almost immediately:
date `date +"%m%d%H%M%C%y.%S"`
Disabling ntp may cause your system clock to skew in the long run, so we’ll try to re-enable it tomorrow.
On a side note, our problem wasn’t as bad as some other folks, who had entire linux clusters crash.
This stuff interest you? Crittercism is hiring a Senior Operations Engineer.