For each page of HTML you're sending (on average) 8 other files, some of which may be quite large. These are your JS, CSS, graphics, etc.
The actual performance bottleneck is the browser requesting those files and accepting the bytes s... l... o... w... l... y...
To scale, then, do this.
Use multiple front-ends balanced with a pure software solution like wackamole. http://www.backhand.org/wackamole/
Use proxy servers like squid to send the "other" files. They're largely static. This is where 7/8ths of the work is done downloading to the client. Don't scrimp on getting these right.
Use multiple, concurrent mod_wsgi/Django to create the -- rare -- piece of dynamic HTML based on DB queries. Be sure that mod_wsgi is in daemon mode so that you can have multiple Django servers available to Apache. Build as many of these as you need. They're all identical, all in parallel, and all shared by Wackamole.
Use a single, fast database like MySQL for the few things that must come from a database. MySQL will make use of multiple cores on it's server, so it will scale reasonably well without you having to do anything other than buy memory. Put this on a separate box, all by itself, dedicated and tuned for just this.
You'll find that this scales nicely. You'll find that the load is shared nicely between squid, apache, the Django daemons and the actual database. You'll also find that each part of the load (from the boring static parts to the interesting database query) happens separately and concurrently.
Under a certain point of view, scaling techniques are quite accepted and consolidated. So instead on relying on web links/articles, I'd read books on the topic before starting the probject.
I suggest:
"Scalable Internet Architectures" - http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X/ref=sr_1_1?ie=UTF8&s=books&qid=1274462743&sr=8-1
"The Art of Scalability" - http://www.amazon.com/Art-Scalability-Architecture-Organizations-Enterprise/dp/0137030428/ref=sr_1_3?ie=UTF8&s=books&qid=1274462743&sr=8-3
"Art of capacity Planning" - http://www.amazon.com/Art-Capacity-Planning-Scaling-Resources/dp/0596518579/ref=pd_bxgy_b_text_b
I think this is a hard one without working on a big project with senior people. I struggled with this for years and still do but have gotten a little better thanks to senior developers guiding me. I think besides reading books and attending lectures or listening in on talks online, I think you need to find a problem sizable enough and build it from scratch. My work with senior guys has shown me that designing on paper and the white board is essential to tackling the problem for understanding and eventually building it. The patterns will reveal themselves as you go through this exercise if any patterns are needed at all, and I think this is something you definitely will not learn overnight but overtime by getting your hands dirty. If you have not visited this site infoq.com, please do so. It has tons of experts speaking on it. I have been able to incorporate some of what I have learned from here to the job. A quick link to how YouTube was built: http://www.youtube.com/watch?v=ZW5_eEKEC28&feature=youtu...
Really good talk with lots of jewels. Another good book for you: http://www.amazon.com/Scalable-Internet-Architectures-Theo-S...
Buy this. http://www.amazon.com/Scalable-Internet-Architectures-Developers-Library/dp/067232699X
The database isn't your bottleneck.
Check your browser carefully.
For each page of HTML you're sending (on average) 8 other files, some of which may be quite large. These are your JS, CSS, graphics, etc.
The actual performance bottleneck is the browser requesting those files and accepting the bytes s... l... o... w... l... y...
To scale, then, do this.
Use multiple front-ends balanced with a pure software solution like wackamole. http://www.backhand.org/wackamole/
Use proxy servers like squid to send the "other" files. They're largely static. This is where 7/8ths of the work is done downloading to the client. Don't scrimp on getting these right.
Use multiple, concurrent mod_wsgi/Django to create the -- rare -- piece of dynamic HTML based on DB queries. Be sure that mod_wsgi is in daemon mode so that you can have multiple Django servers available to Apache. Build as many of these as you need. They're all identical, all in parallel, and all shared by Wackamole.
Use a single, fast database like MySQL for the few things that must come from a database. MySQL will make use of multiple cores on it's server, so it will scale reasonably well without you having to do anything other than buy memory. Put this on a separate box, all by itself, dedicated and tuned for just this.
You'll find that this scales nicely. You'll find that the load is shared nicely between squid, apache, the Django daemons and the actual database. You'll also find that each part of the load (from the boring static parts to the interesting database query) happens separately and concurrently.
Finally, buy Schlossnagle's book. http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X
Under a certain point of view, scaling techniques are quite accepted and consolidated. So instead on relying on web links/articles, I'd read books on the topic before starting the probject.
I suggest:
The Art of Scalability http://www.amazon.com/Art-Scalability-Architecture-Organizat...
Scalable Internet Architectures http://www.amazon.com/Scalable-Internet-Architectures-Theo-S...
Enterprise Cloud Computing http://www.amazon.com/Enterprise-Cloud-Computing-Architectur...
http://www.amazon.com/Scalable-Internet-Architectures-Develo...
I think this is a hard one without working on a big project with senior people. I struggled with this for years and still do but have gotten a little better thanks to senior developers guiding me. I think besides reading books and attending lectures or listening in on talks online, I think you need to find a problem sizable enough and build it from scratch. My work with senior guys has shown me that designing on paper and the white board is essential to tackling the problem for understanding and eventually building it. The patterns will reveal themselves as you go through this exercise if any patterns are needed at all, and I think this is something you definitely will not learn overnight but overtime by getting your hands dirty. If you have not visited this site infoq.com, please do so. It has tons of experts speaking on it. I have been able to incorporate some of what I have learned from here to the job. A quick link to how YouTube was built: http://www.youtube.com/watch?v=ZW5_eEKEC28&feature=youtu...
Really good talk with lots of jewels. Another good book for you: http://www.amazon.com/Scalable-Internet-Architectures-Theo-S...
Hope this helps.