Chef Solr and the Leap Second Bug

At Crittercism we use chef to easily deploy and manage our servers. We started noticing very poor performance (high CPU usage) on our chef machine — luckily we found this article on Hacker News:

http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/

dmesg showed the following output:

[2245059.916474] Clock: inserting leap second 23:59:60 UTC
[2247718.768385] Clocksource tsc unstable (delta = 1099511625387 ns)

The bug itself is in the Linux kernel, but Java applications are particularly vulnerable. Chef is primarily Ruby based, but does have one Java component — “Chef Solr” a light wrapper around Apache Solr. Its function is to allow quick search of any chef metadata (such as the hundreds of nodes we have!)


image from http://wiki.opscode.com/display/chef/Architecture

The article points out a quick fix, which sure enough worked almost immediately:

/etc/init.d/ntp stop
date `date +"%m%d%H%M%C%y.%S"`

Disabling ntp may cause your system clock to skew in the long run, so we’ll try to re-enable it tomorrow.

On a side note, our problem wasn’t as bad as some other folks, who had entire linux clusters crash.

This stuff interest you? Crittercism is hiring a Senior Operations Engineer.

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: