Folding@home is biological
research based upon the science of
where molecular chemistry and mathematics are combined in
to predict how protein molecules might fold in three spatial dimensions over time.
When I first heard about this, I recalled Isaac Asimov's sci-fi
magnum opus colloquially known as
The Foundation Trilogy
which introduced the fictional branch of science called
where statistics, history
and sociology are combined in computer-based models
to predict humanity's future.
Years ago I became infected with an
Asimov inspired optimism about humanity's future and have since felt
the need to contribute to it. While Folding@home will not cure my
"infection of optimism", I am convincedDr. Asimov (who received a Ph.D. in Biochemistry
from Columbia in 1948 then was employed as a Professor of Biochemistry at the Boston School of Medicine until 1958 when his writing workload became too large) would have been fascinated by something like this.
Dr. Asimov, I'm computing these protein folding
sequences in memory of you, and your work.
I was considering a financial
charitable donation to Folding@home when it occurred to me that
my money would be better spent by:
Making a knowledgeable charitable donation to all of humanity
by increasing my Folding@home
computations (which will advance medical discoveries along with
associated pharmaceutical treatments thus lengthening human
life). I was already folding on
a half-dozen computers anyway so all I needed to do was purchase
used video cards on eBay.
Convincing others (like you) to follow my example.
My solitary folding efforts will have little effect on humanity's
future. Together we can make a real difference.
Quote: the CPU is irrelevant. Now it's all about the GPU.
Recommendations: The GTX-560 is the most cost-effective
hardware for contributing to protein folding science. In
subsequent years you should buy anything ending in a 60 (GTX-660, GTX-760,
Misfolded proteins have been associated with numerous diseases and age-related illnesses. However, proteins are so much more larger and
complicated than smaller molecules that it is not possible to begin a
chemical experiment without first providing hints to researchers about
where to look and what to look
for. Since the behavior of atoms-in-molecules (Computational
Chemistry) as well as atoms-between-molecules (Molecular
Dynamics) can be modeled, it makes more sense to begin with a
computer analysis. Then permitted configurations can then be passed on to
Cooking an egg causes the clear protein
(albumen) to unfold into long strings, with the result that they now can
intertwine into a tangled network which will stiffen then scatter light (appear white). No chemical
change has occurred but taste, volume and color
have been altered.
here to read a short "protein article"
by Isaac Asimov published in 1993 shortly after his death.
Single CPU Systems
Using the most powerful single core processor (CPU) available today, simulating
the folding possibilities of one large protein molecule for
millisecond of chemical time might require one million
days (2737 years) of computational time. However (and this is
where you come in), if the problem is sliced up then assigned to 100,000
personal computers over the internet, the computational requirement would drop to
ten days. Convincing friends, relatives, and
employers to also do the same would drop the computational requirement further.
chemical time in nature
1 million computers
1 S (1.0 seconds)
1 mS (0.001 seconds)
1 uS (0.000001 seconds)
Additional information for techies, hackers and science buffs
here to skip past this section.)
Special-purpose research computers like IBM's
employ 10,000 to 20,000 processors (CPUs) joined by many kilometers of
optical fiber to do
something similar in one location. (caveat: Roadrunner is a hybrid
technology employing CPUs and special non-graphic GPUs called
As of May 2015, the Folding@home
project consists of
computers (some CPUs, some GPUs) which is equivalent to 32,132 TeraFLOPS. This means that
the million-day protein simulation problem could theoretically be completed
in (1,000,000/146,466) 6.8 days but since there are many more protein molecules
than DNA molecules, humanity could be at this for quite some time.
Adding your computers to Folding@home will permanently advance mankind's
progress in protein research.
These number used to be much higher but distributed-computing lost some
contributors when a large fraction of society shifted from PCs to
There are almost 1 billion accounts registered on Facebook. Even
if some of these represent organizations, I am shocked that
there are less than one million "active processors" at
folding-at-home. We all know there are other distributed
computing projects on the internet but "less than one
million protein-folders" seems a crime against
Genome Project (to study human DNA) was being planned, it was
thought that the task may require 100 years. However, technological change
in the area of computers, robotic sequencers, and use of the
internet to coordinate the activities of a large number of
universities (each assigned a small piece of the problem),
allowed the human genome project to publish results after only 15
years. A 600% increase in speed.
Distributed computing projects like Folding@home and
BOINC have only been possible since 1995 when the
was first proposed in 1989 to solve a document sharing problem amongst
scientists at CERN in Geneva) began to
make the internet both popular and ubiquitous.
Processor technology was traditionally defined like this:
(both are SIMD but SSE uses its own floating point registers)
2001: SSE2 was
implemented on Pentium 4 from Intel
2004: SSE3 was
implemented on Pentium 4 Prescott on from Intel
2006: SSE4 was
implemented on Intel Core and AMD K10
AVX (Advanced Vector Instructions) proposed by Intel + AMD but
not seen until 2011
this technology employs 256-bit
But GPU (graphics programming units) take vector processing to a whole
new level. Why? A $150.00
graphics card can now equip your
system with 1500-2000 streaming processors and 2-4 GB of additional
high speed memory.
AMD will manufacture an 8-core APU in 2013 which will be
targeted at Sony's PS4 (PlayStation 4) and Microsoft's
XBOX-One (a.k.a. XBOX-720).
I've been in the computer hardware-software business for a
while now but can confirm that computers have only started to
get real interesting again this side of 2007 with the releases of CUDA, OpenCL,
Distributed computing projects like Folding@home and
BOINC have only been practical since 2005 when the CPUs
in personal computers began to out-perform mini-computers and enterprise servers. This
was partly because...
AMD added 64-bit support to their x86 processor technology calling
Intel followed suit calling their 64-bit extension technology
Since then, the following list of technological improvements has only made computers both
faster and cheaper:
(each core is a fully functional CPU)
chips from all manufacturers
shifting analysis from each CPU core into multiple (hundreds to
thousands) streaming processors found in high-end graphics cards
ATI (now AMD) Radeon graphics cards
NVidia GeForce graphics cards
development of high performance "graphics" memory technology (e.g.
GDDR5) to bypass
processing stalls caused when processors are too fast. Note that GDDR5
will represent main memory in the not-yet-release
Intel's abandonment of
NetBurst which meant
a return to shorter instruction pipelines starting with
Core2 Comment: AMD never went to longer pipelines; a long
pipeline is only efficient when running a static CPU benchmark for marketing
purposes - not running code in real-world where i/o events interrupt the
primary foreground task (science in our case)
HP preferred Itanium2
(jointly developed by HP and Intel) so announced their intention to
gracefully shut down Alpha (it would take more than a year to boot
OpenVMS on Itanium2 and another year for big-system qualification
Alpha technology (which included CSI) was immediately sold to Intel
approximately 300 Alpha
engineers were transferred to Intel between 2002 and 2004
CSI morphed into QPI (some industry watchers say that Intel
ignored CSI until the announcement by AMD to go with the
industry-supported technology known as HyperTransport
The remainder of the industry went with a non-proprietary technology called
which has been described as a multipoint Ethernet for use within a
As is true in any "demand vs. supply"
scenario, most consumers didn't need the additional computing power which
meant that chip manufacturers had to drop their prices just to keep the
computing marketplace moving. This
was good news for people setting up "folding
farms". Something similar is happening today with computer
systems since John-q-public is shifting from "towers and desktops" to "laptops
and pads". This is causing the price of towers and graphics cards to plummet
ever lower. You just can't beat the price-performance ratio of an
Core-i7 motherboard hosting an NVidia graphics card. (prediction:
laptops and pads will never ever be able to fold as well as a tower; towers will
always be around in some form; low form-factor desktops might become extinct)
Shifting from brute-force "Chemical Equilibrium" algorithms to
Bayesian statistics and
Markov Models will
Liquid Water This diagram
depicts an H2O molecule loosely connected to four others
Question: After perusing the
periodic table for a moment you will soon realize that the
molecular mass of
water (H2O) is ~18 while the molecular mass of oxygen (O2) is ~32,
carbon dioxide (CO2) is ~44 and ozone (O3) is
~48. So why is H20 in a liquid state at room temperature while other slightly heavier molecules take the form of a gas?
State at Room Temperature
Short answer: In the case of
an H20 molecule,
even though two hydrogen atoms are electrically bound to one oxygen atom,
the same hydrogen atoms are also attracted to each other and this causes the
water molecule to bend into a Y shape. At the mid-point of the bend, an
electrical charge from the oxygen atom is exposed to the world which allows a weak
connection to the hydrogen atom of a neighboring H20 molecule (water
molecules weakly sticking to each other form a liquid). These weak
connections are called
Van der Waals
Van der Waals did all his computations with pencil and
paper long before the computer was invented but it was only possible because
the molecules involved were small.
Caveat: The compound table above was only meant to get you thinking
because Molecular Mass is not all there is to the picture. Getting back to the
for a moment will show:
all elements in column 1 (except hydrogen) are solid at room
all elements in column 8 (helium to radon) are gaseous at room temperature
Half the elements in row 2 starting with Lithium (atomic number 3)
and ending with Carbon (atomic number 6), as well as two thirds of
row 3 starting with Sodium (atomic number 11) and ending with Sulphur (atomic number 16), are
solid at room
I will leave it to you to determine why.
hint: the answer also involves
the repulsive force between
electrons as well as the attractive force between electrons and protons.
Proteins come in many shapes and sizes. Here is a
very short list:
This "folding knowledge" will be used to develop new drugs for treating
diseases such as:
ALS ("Amyotrophic Lateral Sclerosis" a.k.a. "Lou Gehrig's Disease")
Plaques, which contain misfolded peptides called amyloid beta,
are formed in the brain many years before the signs
of this disease are observed. Together, these plaques and neurofibrillary
tangles form the pathological hallmarks of the disease
P53 is the suicide gene
involved in apoptosis (programmed cell death - something
necessary in order your immune system to kill cancer cells)
CJD (Creutzfeldt-Jakob Disease)
the human variation of mad cow disease
Huntington's disease is caused by a trinucleotide repeat expansion
in the Huntingtin (Htt) gene and is one of several polyglutamine
(or PolyQ) diseases. This expansion produces an altered form of
the Htt protein, mutant Huntingtin (mHtt), which results
in neuronal cell death in select areas of the brain. Huntington's
disease is a terminal illness.
Normal bone growth is a yin-yang
balance between osteoclasts and oseteoblasts. Osteogenesis Imperfecta
occurs when bone grows without sufficient or healthy collagen
The mechanism by which the brain cells in Parkinson's are lost
may consist of an abnormal accumulation of the protein alpha-synuclein
bound to ubiquitin in the damaged cells.
A ribosome is a protein producing
organelle found inside each cell
All my SETI credits were
accumulated in the early 2000s on a few DEC Alpha servers. I stopped contributing to SETI
(sorry SETI) when I
came to the realization that biological and climate science would bring immediate
benefits to humanity.
Rosetta, POEM, and
Docking are protein related.
CPDN (Climate Prediction Data Network) is a group
running "what if" scenarios on climate models.
(World Community Grid) is a unified science group.
Current Folding-at-Home Stats (updated every 3 hours)
faster because I bought a bunch of ATI graphics cards on eBay
faster because I upgraded my ATI graphics cards to GPU2
slower because non-GPU resources were diverted to BOINC (its
all about the science)
slower because one PC motherboard has burned out (will not
repair or replace)
slower because one graphics card burned out
a little faster - upgraded one graphics card from HD-3800 to
faster because of experiments with an SMP client on a quad-core
Faster because: 1. upgraded all version 6 clients to version 7.1.52 2. upgraded two graphics cards from HD-3870 to HD-6570 3.
moved my single SMP client from my Intel Core2 Quad Q6600 system to a
system 4. added an SMP client to a second Core-i7 system
Faster because two Windows-XP machines where changed from
AMD/ATI HD-3870 to NVidia GTX-560 (they hadn't folded for a couple of months
because AMD deleted my OpenCL driver; Why?AMD no longer supports OpenCL on
The NVidia cards are speedsters (AMD should have never driven me
to their competition)
The majority of these impressive times come from two NVidia GTX-560 cards (and two Intel Core-i7
CPUs; each running an SMP client configured for 6 processors rather
A little slower because: 1. I experienced a 24-hour internet outage
2. Sony discontinued support for folding-at-home (my PS3 had been
folding since 2007)
A little slower because of problems at the Stanford website over
A little longer; power supply problems with one PC
A little faster (clicked over at 23:00)
Probably should have been 16 days (see previous line)
A little longer; one PC had stability problems
A few folding slots failed on Windows-Vista machines; fixed by
updating the AMD drivers
A little slower because of problems at the Stanford
A little slower; one AMD HD-6450 graphics card burned out (replaced
with NVidia GTX-660)
A little longer but I don't know why (perhaps the problems have
A bit faster because my brother-in-law gave me an old P4 system
running Windows-XP. So I bought a new NVidia GTX-960 then added this machine
to my little "folding farm"
we're cooking now
Stream Computing via graphics cards
Executive Summary: while a single core Pentium-class CPU
provides also provides one streaming (vector) processor
under marketing names like MMX and SSE, one graphics card can provide
hundreds to thousands.
Modern computers can do 3d graphics two different ways:
in software using a general purpose CPU (central processing unit)
like Intel's Pentium or AMD's Athlon
in specialized hardware using a special purpose GPU (graphics
processing unit) like those found in:
NVidia graphics cards
ATI graphics cards
Sony's PS3 (PlayStation 3) system which can achieve speeds of
100 GigaFLOPS per console
Microsoft's XBOX-360 game console
Scalar vs. Vector
CPUs (central processing units) are scalar processors which
execute instructions sequentially
RISC processors can exploit certain kinds of
instruction-level parallelism. In some cases they can execute
Modern processors (CISC and RISC) also support SIMD (single instruction - multiple
data) technology for certain applications involving DSP (digital
signal processing) or
In the Intel world, SIMD technology goes by the name
GPUs (graphics programming units) are vector
processors which easily execute parallel operations
AMD/ATI cards typically support anywhere between 800 and 200
(typically labeled "unified shaders")
NVidia cards typically support fewer streaming processors but
seem to be able to utilize them more efficiently
Since graphics cards have their own large memory systems, they
should be thought of as a private computer system within your
computer. Remember that this private computer is not going to be
continually trounced by real-world interrupts, etc.
Using your NVidia graphics card to do
NVidia president and CEO, Jen-Hsun Huang (pronounced:
gen-son wang). This 38 minute interview from February 2009 features many
GPU-related topics including CUDA (Compute Unified Device Architecture). Quote: "The CPU is
irrelevant; now it's all about the GPU"
When preparing to run the Folding@home GPU client for the first
time, do not waste your time attempting to use the UPDATE DRIVER tool from
within the DEVICE MANAGER. Instead, go to the NVidia web site and download a
stand-alone driver application.
I run a mixture of systems employing graphics cards from both AMD/ATI
Most of my systems employ the HD-6670 from AMD/ATI.
Two of my
systems employ the GTX-560 from NVidia (I was forced to buy these
cards when AMD/ATI removed OpenCL support from their Windows-XP device
driver in the Spring of 2012)
When purchased new, the price of GTX-560 is
approximately twice that of the HD-6670 but seems to be doing 9-10 times more science even
though my NVidia cards are installed on older hardware platforms running
older operating systems.
Information from 2008 (stills seems technically relevant today)
delayed release of Windows 7 1600/800 = 2.0 fold increase in one year (growth is currently exceeding Moore's law)
GeForce GTX 560
This graphics card is the most powerful folder in my
collection but look at the relatively (compared to AMD/ATI) low
number of shaders. The speed is due to
GPU1 and GPU2 technologies are no longer supported by folding-at-home
all clients may stop working when a certain date is reached. This is
normal behavior and you usually only need to download a newer client or
new device driver.
below for recent bad news for people running Radeon cards on Windows-XP
AMD/ATI GPU2 Graphics Card Caveat
Time and technology never stand still and the same is true for graphics
cards. You can imagine the difficulty science organizations experience
while trying to keep up with the constant introduction of new products from hardware
For the past half-decade the computer industry as been working on heterogeneous technologies
etc.) for doing science on graphics cards. Stanford
Folding Software requires OpenCL (Open Computing Language)
which must not be confused with OpenGL (Open Graphics Library).
Announcement: Stanford to drop GPU2 cards
made by AMD/ATI
In March-2012, Stanford University announced their intention to
drop AMD/ATI GPU2 cards in September, 2012.
host OS: Windows-Vista (SP2), Windows-7
alternatives for those with the wrong OS (or
obsolete graphics hardware)
Give up on GPU-based folding and only do CPU-based
Replace Windows with Linux (will be okay as long as AMD/ATI produces
an OpenCL driver for Linux)
Consider switching to
which continue to support OpenCL on older operating
This might be the best
science-wise decision. Why?
Many BOINC clients will only work with NVidia cards
supporting CUDA or PhysX (more science-friendly)
AMD/ATI cards only support: OpenCL
(depending on your OS) and sometimes DirectCompute
Most GTX NVidea cards support: OpenCL,
I tried to go the cheap route by purchasing used
NVidia cards on eBay but didn't have any luck (they
hardly seem to go up for resale).
So I was forced to buy new units from a retail
For folding purposes you should not buy anything
less than a
GeForce GTX 560 (I bought two of these)
I was able to trial a GeForce GT 520
before purchasing from a friend but found it
unacceptable for my needs (too slow)
As of this writing (2012-06-xx) you want a
part number with GTX (not GT) because "X"
versions do more science in hardware. But double-check
the specs because info at many retail web sites seem woefully inaccurate.
While older cards can no longer be used for science they
still can be used as video cards. I sold all my AMD/ATI cards on
subsidiary of eBay)
Stopped services may only be deleted from DOS like so:
sc query neil369
sc delete neil369
BOINC (Berkeley Open Infrastructure for Network Computing)
BOINC (Berkeley Open Infrastructure for Network Computing) is a science framework in which you can support one, or more, projects of choice.
If you are unable to pick a single cause then pick several because
the BOINC manager will switch between science clients every hour (this interval is adjustable). In my
case I actively support POEM, Rosetta, and Docking.
The current BOINC client can be programmed to use one, some, or all
cores of a multi-core machine. The BOINC client can also utilize (or
not) the streaming processors on your Graphics Card.
is the home of Rosetta@home
which operates through the BOINC framework. Their graphics screen-saver is one
very effective way to help visualize "what molecular dynamics is all about". Science teachers
must show this to their students.
I'm sure everyone already knows that a computer "rendering
beautiful graphical displays" is doing less science. That said,
humans are visual creatures and graphical displays have their place
in our society. Except for some public locations, all clients
should be running in
non-graphical mode so that more system resources are diverted to
Five questions for Rosetta@home: How Rosetta@home helps cure cancer,
AIDS, Alzheimer's, and more
some people may prefer to use the generic BOINC client from
Berkley then install the WCG plugin from that application; you
will still need to create your WCG account at the WCG site
You only need to do this if you want to cycle your BOINC
client between multiple projects of which WCG is just one
If you only want to run the WCG project (which also switches
between IBM sponsored science projects) then it probably makes
more sense to use the WCG-specific client
(WCG) is an effort to create the world's largest public computing grid
to tackle scientific research projects that benefit humanity. Launched
2004-11-16, it is funded and operated by IBM with client software currently
available for Windows, Linux, Mac-OS-X and FreeBSD operating systems.
They encourage their employees and customers to do the same.
Personal Comment: I wonder why HP (Hewlett-Packard) has not followed IBM's lead.
Up until now I always thought of IBM as the template of uber-capitalism but it
seems that the title of "king of profit by the elimination of seemingly
superfluous expenses" goes to HP. Don't they realize that IBM's effort
in this area is done under IBM's advertising budgets? Just like IBM's 1990s
foray into chess playing systems (e.g. Deep Blue) led to increased sales as
well as share prices, one day IBM will be able to say "IBM
contributed to a treatments for human diseases including cancer". IBM
actions in this area reinforce the public's association with IBM and
Encyclopedia of DNA Elements (ENCODE) Consortium is an international
collaboration of research groups funded by the National Human Genome
Research Institute (NHGRI).
The goal of ENCODE is to build a comprehensive parts list of functional
elements in the human genome, including elements that act at the protein
and RNA levels, and regulatory elements that control cells and
circumstances in which a gene is active.
The Encyclopedia of DNA Elements (ENCODE) is a
public research consortium
launched by the
National Human Genome Research Institute (NHGRI) in September
2003. The goal is to find all functional elements in the human
one of the most critical projects by NHGRI after it completed the
Human Genome Project. All data generated in the course of the
project will be released rapidly into public databases.
On 5 September 2012, initial results of the project were
released in a coordinated set of 30 papers published in the journals
Nature (6 publications),
Genome Biology (18 papers) and
Genome Research (6 papers). These publications combine to show
that approximately 20% of
noncoding DNA in the human genome is functional while an
additional 60% is transcribed with no known function. Much of this
functional non-coding DNA is involved in the
regulation of the
expression of coding
Furthermore the expression of each coding gene is controlled by
multiple regulatory sites located both near and distant from the
gene. These results demonstrate that gene regulation is far more
complex than previously believed.
The Million-Core Problem - Stanford researchers break a supercomputing
barrier. quote: A team of Stanford researchers have broken a record in
supercomputing, using a million cores to model a complex fluid dynamics
problem. The computer is a newly installed Sequioa IBM Bluegene/Q system at
the Lawrence Livermore National Laboratories. Sequoia has 1,572,864
processors, reports Andrew Myers of Stanford Engineering, and 1.6
petabytes of memory.