Deprecated: Assigning the return value of new by reference is deprecated in /var/www/html/rd/blogs/wp-settings.php on line 629

Deprecated: Assigning the return value of new by reference is deprecated in /var/www/html/rd/blogs/wp-settings.php on line 644

Deprecated: Assigning the return value of new by reference is deprecated in /var/www/html/rd/blogs/wp-settings.php on line 651

Deprecated: Assigning the return value of new by reference is deprecated in /var/www/html/rd/blogs/wp-settings.php on line 687

Deprecated: Function set_magic_quotes_runtime() is deprecated in /var/www/html/rd/blogs/wp-settings.php on line 18
Hosting « Resolute Digital Tech Blog
Resolute Digital Tech Blog
Resolute Digital Technical Happenings
Media Tech Creative Interactive Strategy Sam Greg

Archive for the ‘Hosting’ Category

Jun
17
2009
by
techyogi

As mentioned in our media blog,I just spent the last 3 weeks building an ad serving network for resolute digital. Our partners have worked with all the major ad networks in our careers, and yet i find myself liking OpenX more and more with each new version. It is open source, which i am a huge proponent of, and does exactly what we need.

Things start getting interesting, however, when you need to scale. I have used OpenX in the past to serve even a few million ads per month on one large server without issue. My goal in this instance was to be able to server 1 BILLION ads per month out of the gate for one of our larger clients. This requires a whole different level of planning, so i whipped out my calculator and started doing some math:

1 billion ads per month. Presumably, this would mostly come in during business hours on weekdays, or a similar pattern on different days. Everyone has their own way of calculating load, and i profess mine to be “practically conservative”. I assume that most of my traffic comes in mostly on weekdays during “global” business hours (peak) as follows, figuring that this is the most traffic i’ll likely get at any one time barring and “Dig” effect, and try to build such that this peak wont use more than 50% of my “real” capacity as follows:

1 billion / 20 week days = 50,000,000 ads per day

50,000,000 per day / 12 hours = 4,166,667 ads per hour

4,166,667 ads per hour / 60 minutes = 69,444 ads per minute

69,444 ads per minute / 60 seconds = 1,157 ads per second

1157 ads per second! That’s quite a lot! Now I know I need to be able to serve a total of 2314 ads per second (using only 50% capacity) in order to meet my goal. I figure the best next move is to get OpenX up and running on one, beefy server and benchmark it. I also read all the OpenX site documentation about distributed processing, scalability, performance, etc… but that comes later. Now, a history lesson that explains the title of this blog entry.

Once upon a time, I was the CTO for sourcetool, a “vertical search” company. Our site was an online directory of products, parts, and service suppliers primarily for procurement managers at large companies who needed to source these products and services. We encouraged businesses to create a free profile, and used a GB-5005 Google Search Appliance (GSA) we bought for about $200k to provide what we thought was the best full-text search possible. At the time, Google had just introduced direct-to-database search (not scraping html pages, but actually indexing our database directly) and while it worked, it was new and full of bugs. I tried for over a year to get it working perfectly, with googles help, but ultimately it only “mostly” worked. Eventually, after testing lots of options, we decommissioned the GSA and started using a load balanced Solr implementation that gave us 95% of what we had using open source software. Yet another win for open source! In the 2 years since, this beast of a server rack sat idle, and my partner and I have always wanted to recoup some value from it as it was a huge investment that really never delivered.

Forward to now, and Dan Savage and I, along with our two other partners Brian Mcnamee and Jarod Caporino founded Resolute Digital and are off to a fantastic start. We are getting into the ad serving business, and I Figure these servers are loaded and would be the perfect platform for our ad serving network. They are 4-core XEON processor servers, each with 1.5 TB of primary RAID storage, and 12GB of RAM Each. It’s also got two redundant Gigabit Ethernet switches, and dual Power Distribution Units (PDUs).

After pulling out my Dremel to remove the rivets on all these servers so I could actually access the ports, I booted one separately, formatted the drives, installed CentOS 5.3, Apache, Mysql, and PHP. Then I set up OpenX 2.8.1, and followed the recommended guidance from OpenX to use Jmeter to see what sort of performance I could get from just one server, and then I’d add enough to hit my benchmark. My initial test got me about 400 ads per minute on just one server, which was ok for a start but nowhere near what I needed. This was also my first time using jmeter, so I was skeptical I was testing correctly. Reading more about using jmeter, I realized I had only one thread running, which is a bad test. Throttling this up to 1000 “users” (threads), I now was humming along serving 1400 ads per minute from my test server – in this case a new unibody 17” macbook pro. I notice that it was my laptop, not the server that was the limit here. It was time to scale up my test.

I wiped a second GSA server and installed yet again Centos 5.3, apache, Mysql, PHP, and also now Java and jmeter. Running this test on a server, I decided to up the ante and set the thread count to 10000 users, and noticed that I was now hitting about 20,000 ads per minute. Clearly my bottleneck was the test server. I was happy, since not only was this an order of magnitude better, but simple math now told me that I could load balance the other servers and hit my target, though not with the 50% usage target I wanted. Before proceeding, I set up yet another server as a test server, and decided to see what would happen if I ran both against the ad server. WOW. 50,000 ads per second on one server. I had hit my target, and it seems the jmeter GUI was my bottleneck. OpenX was working so well I hit my mark on one server!

To complete this setup, I added a sledgehammer from Loadbalancer.org – rated at 25 millions simultaneous connections – and used this to loadbalance 4 of the 5 servers that would serve as my “distribution” servers. One remained my admin server and master database, and I have mysql replication from the master database to the 4 distribution servers. Each of the distribution servers uses cron to trigger openx’s distributed processing scripts to send statistics back to the master database so all the statistics and analytics are correct.

I tested the final, complete implementation this morning, and my testing servers gave out when I was humming along at about 5 billion ads per month with 50,000 simultaneous users. My ad network didn’t even break a sweat – using about 60% CPU on average. And THATS how openx trumps google.

image
Ben Sanders
Managing [Technology] Partner
I'm the Managing Technical Partner (CTO) for Resolute Digital, LLC. I've been a professional technologist for over 20 years, and an avid practitioner of yoga for the last 10 years - hence my moniker "techyogi". I believe strongly in the value that open source brings, but am ultimately a pragmatist when it comes to technology and it's use in solving real business problems in a cost effective way.
  • You are currently browsing the archives for the Hosting category.

  • Archives
  • Categories
  • image
    Copyright © 2005-09 Resolute Digital LLC Digital Marketing Agency
    Entries (RSS) & Comments (RSS)