One of Boomcycle’s favorite things to do is to help successful clients overcome growing pains. We often see clients with good business models who are limping along with an inadequate or broken system that was built by well-intentioned, but less experienced developers. In some cases, we see clients with good systems being managed poorly by a clueless third-party IT company.
Sometimes we see both at the same time.
One recent client has a website built on a modified version of Joomla, a popular content management application. The website handled most general operations well, but was crashing on a daily basis. Our job in this case was to perform triage and locate the ailment as soon as possible.
We had some discussions with the business owner, a really good guy and software developer who had extensive experience with his system. Having inherited the system from a prior developer, our client had been able to keep the system running and made some alterations to the code to solve a variety of problems. He was intimately acquainted with the code, especially the modified parts, and had been unable to find anything in the code that might be causing the crashes. His knowledge of the system’s workings was extremely valuable as we diagnosed the problem.
The most salient symptom of the server’s ills was that the session table repeatedly crashed. We had seen this kind of behavior before in other open source software packages. It was common for a visit from the Googlebot to crash a session table because the bot would simply overwhelm the server with a swarm of concurrent requests. The session-handling code would end up creating thousands of session records for a bot that shouldn’t ever have a session for any reason and the server would eventually experience a write error. One of our first suggestions was to alter the system to exclude bots from session handling.
But something else was fishy about the consistency and regularity of these crashes.
In every single case the crashes happened in the wee hours of the morning — that time of day when the website’s cron jobs are running, doing cleanup and backup duties. The Googlebot is a capricious beast and doesn’t arrive on a consistent schedule. Session table crashes due to a bot are rather infrequent occurrences. It had to be something else.
Boomcycle senior developer Jason therefore convinced the owner to install a script that would tell us when the server was getting busy. It was a cron job that ran every minute and, if the system load got too high, would send a text message to Jason’s phone so he could jump online and sniff out the problem (possibly ruining a romantic evening with the spouse!)
Sure enough, the very first night we received complaints from the system. Jason logged into the box and examined the process list and noted that there was a backup cron script that was devouring the CPU’s available cycles.
We eventually learned that the hosting provider had installed a backup script which was trying to create a compressed archive of over 30GB of files to the exact same hard drive where they lived in the first place. The hard drive would grind to a halt on a nightly basis. When we pointed it out to the hosting provider and asked at least for a separate hard drive for backup, we learned that the box is incapable of accommodating an additional hard drive. Unfortunately, the system had to be moved to another box. The site owner agreed that we should move his system to a different hosting company.
Disabling the backup script introduced a problem, however. How could the files be backed up? This question was part of a bigger problem: the system had to download large amounts of data at night. Because the data downloads were occurring sequentially, they were taking too long.
This is a classic growing pain: the system needs to handle more data and it must be backed up.
A Unique Software Solution
Fortunately, recent developments in cloud computing and multi-core servers presented a perfect solution for our developer. Jason constructed a multi-threaded PHP program to perform the downloads concurrently rather than sequentially. The program, which runs as a daemon, was installed on a Cloud Server at Rackspace.com where it loads the downloaded data to a content delivery network (CDN). The system can rely on the cloud and the CDN to fetch the required files, deliver them quickly to the end user, and back them up more or less continuously.
Backup problem? Solved.
Data download problem? Solved.
Future plans may include configuring the systems to instantiate numerous Cloud Servers on demand and then de-allocate them when they finish their work. This will allow for some serious scaling.