Folding@home is based upon the science of
Molecular Dynamics
where molecular chemistry and mathematics are combined in computers
to predict how protein molecules fold in three spatial dimensions over time.
When I first heard about this, I recalled Isaac Asimov's sci-fi
masterpiece colloquially known as
The Foundation Trilogy
which is based upon the fictional branch of science called
psychohistory
where statistics, history
and sociology are combined in computers to predict humanity's
future. How did Asimov conceive of such things?
Years ago, I became infected with an
Isaac
Asimov inspired optimism about humanity's future and have since felt
the need to contribute to it. While Folding@home will not cure my
"infection of optimism", I am convinced
Dr. Asimov (who received a Ph.D. in Biochemistry
from Columbia in 1948 then was employed as a Professor of Biochemistry at the Boston School of Medicine staying
there until 1958) would be fascinated by something like this.
Click Asimov's picture to view his 1992 pop-ed article about
protein.
Dr. Asimov, I'm computing these protein folding
sequences in memory of you and your work.
I was considering a financial
charitable donation to Folding@home when it occurred to me that
my money would be better spent by:
making a knowledgeable charitable donation to all of humanity
by increasing my Folding@home computations (which will advance
medical discoveries along with associated pharmaceutical treatments).
I was already folding on a half-dozen computers anyway so all I needed
to do was purchase used video
cards on eBay.
Misfolded proteins have been associated with many diseases as well as
age-related illnesses. However, proteins are so much more complicated than
other molecules that it is not possible to begin a chemical experiment without
first providing hints to researchers about "where to look" and "what to look
for". Since the behavior of "atoms-in-molecules" as well as "atoms-between-molecules"
can be computed (Molecular
Dynamics), it makes more sense to begin with a computer analysis.
Then permitted configurations can then
be passed on to experimentalist researchers.
Real-world observation:
Cooking an egg causes the clear protein (albumen) to unfold into long
strings, with the result that they now can intertwine into a tangled network
which will stiffen and scatter light (reflect white light). No chemical
change has occurred but taste, volume and color
have been altered.
Computer Solutions:
Using the most powerful single processor (CPU) available today, simulating
the folding possibilities of one large protein molecule for
one
millisecond of chemical time might require one million
days (2737 years) of computational time. However (and this is
where you come in), if the problem is sliced up then assigned to 100,000 donor
PCs via the internet, the computational requirement would drop to
10 days. Convincing your friends, relatives, and
co-workers to also do this could drop the computational requirement to
1 day.
chemical time in nature
simulation time
one computer
one computer
100,000 computers
1 million computers
1 uS (0.000001 seconds)
1,000 days
2.73 years
14.4 mins
1.44 mins
1 mS (0.001 seconds)
1,000,000 days
2,737 years
10 days
1 day
1 S (1.0 seconds)
1,000,000,000 days
2,737,850 years
27 years
2.7 years
Additional notes for techies:
Special-purpose research computers like IBM's
BlueGene and
Roadrunner
employ 10,000 to 20,000 processors (CPUs) joined by many kilometers of
optical fiber to do
something similar.
As of December 2011, the Folding@home
project consists of
460,000 active processors
(some CPUs, some GPUs) which is equivalent to 6.4 PetaFLOPS. This means that the
million-day protein simulation problem could theoretically be
completed in (1,000,000/460,000) 2.17
days. But since there are many more protein molecules
than DNA molecules, humanity could be at this for quite some time to
come.
Adding your computers to Folding@home will permanently advance mankind's
progress in protein research.
Personal Comment: There are
almost 1 billion accounts registered on FaceBook. Even if some of these
entries represent "companies" or "duplicate entries by people", I am still
shocked that there are less than one million "active" accounts at
folding-at-home. We all know there are other distributed computing projects
on the internet but "less than one million protein-folders" is almost a crime against
humanity.
Side Note: When the
Human
Genome Project (to study human DNA) was being planned it was
thought that the task may require 100 years. However, technological change
in the area of computers, robotic sequencers, and use of the
internet to coordinate the activities of a large number of
universities (each assigned a small piece of the problem)
allowed the human genome project to publish results after only 15
years.
Distributed computing projects like Folding@home and
BOINC have only been possible since 1995 when the
world-wide-web (which
was first proposed in 1989 to solve a document sharing problem at CERN)
began to make the internet popular and ubiquitous.
Distributed computing projects like Folding@home and
BOINC have only been practical since 2005 when the CPUs
in personal computers began to out-perform mini-computers and enterprise servers. This
was partly because...
AMD added 64-bit support to their x86 processor technology calling
it x86-64.
Intel followed suit calling their 64-bit extension technology
EM64T
Intel added DDR2 support to their
Pentium 4 processor
line
AMD added DDR2 support to their
Athlon 64
processor line
Since then, these technological improvements have only made computers both
faster and cheaper:
multi-core
(each core is a fully functional CPU)
chips from all manufacturers
shifting analysis from each CPU core into multiple (hundreds to
thousands) streaming processors found in high-end graphics cards
ATI (now AMD) Radeon graphics cards
Nvidea GeForce graphics cards
development of high performance "graphics" memory technology (e.g.
GDDR3 and
GDDR4) to bypass
processing stalls caused when processors are too fast.
Intel's abandonment of
NetBurst which meant
a return to shorter instruction pipelines starting with
Core2 (Note that AMD never went to longer pipelines; a long pipeline is only
efficient when running a static CPU benchmark - not running code in real-world operating
systems like Windows, UNIX, and Linux)
Intel replacing 20-year old
FSB technology
with a new approach called QPI (QuickPath
Interconnect). See their
Core i7 chips. Note: this technology was
invented by DEC for their
Alpha CPUs and named CSI (Common System Interconnect). Compaq bought DEC
in 1998. The Alpha Engineering team was sold to Intel in 2001 during the
merger discussions between HP and Compaq. The merger was completed in 2002.
The AMD equivalent of QPI is called
HyperTransport
which has been described as a multipath Ethernet targeted for use within a
computer system.
As is true in any "demand vs. supply"
scenario, most consumers didn't need the additional computing power which
meant that chip manufacturers had to drop their prices just to keep the
computing marketplace moving. This
was good news for people setting up "folding
farms".
Shifting from brute-force "Chemical Equilibrium" algorithms to
techniques involving
Bayesian statistics and
Markov Models will
enable some
exponential speedups.
Folding@home: the most powerful and energy
efficient supercomputer in the world -
Vijay Pande (Stanford University) gave this one hour lecture
at PARC (Palo Alto Research Center) on 2009-01-08
This "folding knowledge" will be used to develop new drugs for treating
diseases such as:
ALS ("Amyotrophic Lateral Sclerosis" a.k.a. "Lou Gehrig's Disease")
Alzheimer's
Disease
Plaques which contain misfolded peptides called amyloid beta
are formed in the brain many years before the clinical signs
of Alzheimer's are observed. Together, these plaques and neurofibrillary
tangles form the pathological hallmarks of the disease
Cancer
& p53
P53 is the suicide gene
involved in apoptosis (programmed cell death - something
necessary in order your immune system to kill cancer cells)
CJD (Creutzfeldt-Jakob Disease)
the human variation of mad cow disease
Huntington's
Disease
Huntington's disease is caused by a trinucleotide repeat expansion
in the Huntingtin (Htt) gene and is one of several polyglutamine
(or PolyQ) diseases. This expansion produces an altered form of
the Htt protein, mutant Huntingtin (mHtt), which results
in neuronal cell death in select areas of the brain. Huntington's
disease is a terminal illness.
Osteogenesis
Imperfecta
Normal bone growth is a yin-yang
balance between osteoclasts and oseteoblasts. Osteogenesis Imperfecta
occurs when bone grows without sufficient or healthy collagen
(protein)
Parkinson's
Disease
The mechanism by which the brain cells in Parkinson's are lost
may consist of an abnormal accumulation of the protein alpha-synuclein
bound to ubiquitin in the damaged cells.
Ribosome
& antibiotics
A ribosome is a protein producing
organelle found inside each cell
ATI claims that science software will run between 20 and 40 times faster
on a GPU (graphics processing unit) than a traditional general purpose CPU.
This is an increase of 2000% to 4000% but ATI didn't tell us if they were
comparing to Pentium-3 (which only supported MMX/SSE) or Pentium-4
(where early models also contained SSE2 support while later models
also contained SSE3 support)
an October 2006 trial by Stanford indicated a science speedup of over 70 times (7,000%)
when tuned for the ATI-x1950 graphics card.
The GPU2 client running on an HD-3870 is rumored to increase analysis
throughput (relative to a CPU client) by over 105 times (10,500%).
Some of this is because of the new ATI hardware while some is due to
replacing DirectX with
CAL
Modern computers can do 3d graphics two different ways:
in software using a general purpose CPU (central processing
unit) like Intel's Pentium or AMD's Athlon, etc.
in specialized hardware using a special purpose GPU (graphics
processing unit) like those found in:
ATI graphics cards
NVIDIA graphics cards
Sony's PS3 (PlayStation 3) system which can achieve speeds of
100 GigaFLOPS per console
Scalar vs. Vector
CPUs (central processing units) are scalar processors which
execute instructions sequentially
Some RISC processors can exploit certain kinds of
instruction-level parallelism. In some cases they can execute
instructions out-of-order.
Some CISC processors support SIMD (single instruction - multiple
data) instructions for certain applications involving DSP (digital
signal processing) or
multi-media.
GPUs (graphics programming units) are PC-based vector
processors which easily execute parallel operations
ATI's x1950 was released in 2006 with 36 processors (pixel shaders)
ATI's HD-4870 was released in 2008 with 800 processors
(unified shaders)
This is an increase of 22 times in only 2 years. (Moore's Law
expects a doubling every 18 months)
Since graphics cards have their own large memory systems, they
should be thought of as private computer systems within your
computer. Also, this private memory is not going to be trounced by
interrupting devices etc.
synchronized with
delayed release of Windows 7 1600/800 = 2.0 fold increase in one year (growth is currently exceeding Moore's law)
all clients may stop working when a certain date is reached. This is
normal behavior and you only need to download a newer client.
the following events will cause the Folding@home ATI-GPU-Client to exit:
doing a 3-finger-salute (ctrl-alt-del) to bring up the Windows
Task Manager (always fails with an ATI-x1650 on single CPU; never
fails with an ATI-x1950-Pro on a dual core CPU). This means that you
must bring up the Task Manager by performing a right-click on the task
bar then clicking on the "Task Manager" item.
the windows screen saver (for some reason it takes over the whole
graphics subsystem with an INIT). This means you should set:
Screen Saver: None
Monitor Power: Turn off monitor power after 5 minutes
playing any full screen game
the following events may cause the Folding@home ATI-GPU-Client to
permanently slow to a crawl:
using Windows Media Player to watch a video You will need to restart your client in order to recover from this.
New (2011-July) ATI Graphics Card Caveat:
Some distributed science projects require double-precision floating-point
but Folding@home is not one of them. That said, I just replaced a
defective graphics card with a brand new ATI
HD-5570 graphics card and none of the official GPU clients seem to
support it. While poking around the
Folding Forum I came up with a beta GPU3 client which does work. Click
the following link for more information:
http://foldingforum.org/viewtopic.php?f=59&t=14683&p=144648
If
you do require double-precision floating-point then you had better do some
research before trekking to the store:
Multicore: It's No Game (June 9, 2007) http://www.ddj.com/architect/199902753 Quote: In total, approximately 250,000 PlayStation3 consoles are
contributing about 400 TFLOPs of compute power, making it the number one
compute-resource contributor to Folding@home - more than doubling that from
Windows-based PCs.
Sony Studies Commercial PlayStation 3 Supercomputing Grid (April
11, 2007) http://www.ddj.com/architect/199000499 Quote: On Wednesday, for example, 20,000 of the 200,000 PS3 users
who have signed up for the Folding@home project were online, delivering
a combined processing speed of 267 teraflops, Dave Karraker, spokesman for
Sony Computer Entertainment America, told InformationWeek. By comparison,
the 200,000 PCs online were producing a combined speed of 240 teraflops.
A teraflop is a trillion mathematically computations, called floating-point
operations, per second.
Charlie Rose interviews Nvidia president and CEO, Jen-Hsun Huang (pronounced:
gen-son wang). This 38 minute interview from February 2009 features many
GPU-related topics including CUDA (Compute Unified Device Architecture).
Quote: "The CPU is
irrelevant; now it's all about the GPU"
Note: when preparing to run the Folding@home GPU client for the first
time, do not waste your time attempting to use the UPDATE DRIVER tool from
within the DEVICE MANAGER. Instead, go to the NVIDIA web site and download a
stand-alone driver application. The driver must contain the CUDA extension
and should be version to 177.35
GPU Programming (not required to use
folding-at-home)
Scalar vs. Vector
CPUs (central processing units) are scalar processors
which execute instructions sequentially
Some RISC processors can exploit certain kinds of
instruction-level parallelism. In some cases they can
execute instructions out-of-order.
Some CISC processors support SIMD (single instruction -
multiple data) instructions for certain applications
involving DSP or multi-media.
GPUs (graphics programming units) are PC-based vector
processors which easily execute parallel operations
The ATI x1950 was released in 2006 with 36 processors
(pixel shaders)
The ATI HD-3870 was released in 2007 with 320 processors
(unified shaders)
The ATI HD-4870 was released in 2008 with 800 processors
(unified shaders)
This is an increase of 22 times in only 2 years.
(Moore's Law expects transistors to double every 18 months)
Since graphics cards have their own large memory
systems, they should be thought of as private computer
systems within your computer. Also, this private memory is
not going to be trounced by interrupting devices etc.
In many cases vector processors are easily two orders of
magnitude (100 times) more powerful than scalar processors.
https://simtk.org/home/openmm (molecular
mechanics/molecular dynamics) is the next big thing in computers. While
modern CPUs tend to support one-to-four cores and solve problems
sequentially, modern GPUs support 300-800
cores and compute in parallel. Modern GPUs do not (yet) seem to be
limited in the same way that Moore's Law affects today's CPUs.
http://go.microsoft.com/fwlink/?LinkId=4544
- Windows Server 2003 Resource Kit Tools (Also works with XP) includes cool stuff like: imagecfg , sleep (used to pause a script), timezone
, etc.
Stopped services may only be deleted from DOS like so:
sc query neil369
sc delete neil369
Console Client Startup
Script for GPU1
caveat: no longer required with the newer GPU2 clients associated starting
with HD-2000 series ATI cards
@echo off
echo "================================="
echo "GPU console client control script"
echo "================================="
echo "sleeping 2 minutes while Windows is starting"
echo "you may wish to start TASK-MANAGER to set CPU Affinity"
@echo on
sleep 120
cd /d c:\folding-0
:myloop
echo "starting the GPU console client"
fah6-win-gpu-console.exe -local
echo ">>> the console has just exited <<<"
echo "did someone do a 3-finger salute?"
echo "sleeping 1 minute while the system stabilizes"
sleep 60
goto myloop
rem ==============================
Notes:
"sleep" is an application found in the "Windows Server 2003 toolkit"
which can be downloaded from here: http://go.microsoft.com/fwlink/?LinkId=4544 Make sure you place a copy of "sleep.exe" in your working directory (or
invoke it by its full path name)
if this is a new-client installation then you must first run the client
manually to configure the settings file
as of 2007-12-31 you should make sure that your screen saver is set
to "none" in order to avoid a reset of the graphics card
an alias to this script should be placed in the following location:
Start >> Programs >> Startup
If you set the affinity of the CMD process while it is still waiting
to start, you then won't need to set the affinity of any daughter tasks.
BOINC (Berkeley Open Infrastructure for Network Computing)
BOINC Is a framework in which
you can support one, or more, projects. If you can't pick between several
good causes then join several which will cause your BOINC
manager to switch between clients every hour (this interval is adjustable). In my
case I actively support POEM, Rosetta, and Docking.
http://boinc.bakerlab.org/rosetta/
is the home of Rosetta@home
which operates through the BOINC framework. Their graphics screen-saver is one
very effective way to help visualize "what molecular dynamics is all about". Science teachers should show this to their students.
I'm sure everyone already knows that a computer rendering
beautiful graphical displays is doing less science. That said,
humans are visual creatures and graphical displays have their place
in our society. Except for some public locations, all clients
should be running in
non-graphical mode so that more system resources are diverted to
protein analysis.
Five questions for Rosetta@home: How Rosetta@home helps cure cancer,
AIDS, Alzheimer's, and more
http://en.wikipedia.org/wiki/World_community_grid
(WCG) is an effort to create the world's largest public computing grid
to tackle scientific research projects that benefit humanity. Launched
2004-11-16, it is funded and operated by IBM with client software currently
available for Windows, Linux, Mac-OS-X and FreeBSD operating systems.
They encourage their employees and customers to do the same.
Personal Comment: I wonder why Hewlett-Packard has not followed IBM's lead.
Up until now I always thought IBM was the template of uber-capitalism but it
seems that the title of "king of profit by the elimination of seemingly
superfluous expenses" goes to HP. Don't they realize that IBM's effort
in this area is done under IBM's advertising budgets? Just like IBM's 1990s
foray into chess playing systems (e.g. Deep Blue) led to increased sales as
well as increase share prices, one day IBM will be able to say "IBM
contributed to a treatments for human diseases including cancer".