Text Resize
Zoom In   : Ctrl +
Zoom Out  : Ctrl -
Zoom Reset: Ctrl 0
Folding@Home

Tips and Advocacy

Click here for a low-tech version: Guaranteed Human Life Extension
 
You have heard the phrase "knowledge is power" but this may be the first time that "power is knowledge"
and
BOINC (Berkeley Open Infrastructure for Network Computing)
 
The Folding@home article at Wikipedia is more informative than this personal effort.
 
Dr. Isaac Asimov

Dr. Isaac Asimov

Folding@home is biological research based upon the science of Molecular Dynamics where molecular chemistry and mathematics are combined in computer-based models to predict how protein molecules might fold in three spatial dimensions over time.


When I first heard about this, I recalled Isaac Asimov's sci-fi magnum opus colloquially known as The Foundation Trilogy which introduced the fictional branch of science called psychohistory where statistics, history and sociology are combined in computer-based models to predict humanity's future.
 
Years ago I became infected with an Asimov inspired optimism about humanity's future and have since felt the need to contribute to it. While Folding@home will not cure my "infection of optimism", I am convinced Dr. Asimov (who received a Ph.D. in Biochemistry from Columbia in 1948 then was employed as a Professor of Biochemistry at the Boston School of Medicine until 1958 when his writing workload became too large) would have been fascinated by something like this.

Dr. Asimov, I'm computing these protein folding sequences in memory of you and your work.

I was considering a financial charitable donation to Folding@home when it occurred to me that my money would be better spent by:
  1. Making a knowledgeable charitable donation to all of humanity by increasing my Folding@home computations (which will advance medical discoveries along with associated pharmaceutical treatments thus lengthening human life). I was already folding on a half-dozen computers anyway so all I needed to do was purchase used video cards on eBay.
     
  2. Convincing others (like you) to follow my example. My solitary folding efforts will have little effect on humanity's future. Together we can make a real difference.

Quick-Navigation Menu

Protein Folding Overview

Science Problem:

Misfolded proteins have been associated with numerous diseases as well as age-related illnesses. However, proteins are so much more larger and complicated than smaller molecules that it is not possible to begin a chemical experiment without first providing hints to researchers about where to look and what to look for. Since the behavior of "atoms-in-molecules" (Computational Chemistry)  as well as "atoms-between-molecules" (Molecular Dynamics) can be modeled, it makes more sense to begin with a computer analysis. Then permitted configurations can then be passed on to experimental researchers.

Real-world observation:

Cooking an egg causes the clear protein (albumen) to unfold into long strings, with the result that they now can intertwine into a tangled network which will stiffen and scatter light (appear white). No chemical change has occurred but taste, volume and color have been altered.

Click here to read a short "protein article" by Isaac Asimov published in 1993 shortly after his death.

Computer Solution:

Single CPU Systems

Using the most powerful single core processor (CPU) available today, simulating the folding possibilities of one large protein molecule for one millisecond of chemical time might require one million days (2737 years) of computational time. However (and this is where you come in), if the problem is sliced up then assigned to 100,000 personal computers over the internet, the computational requirement would drop to ten days. Convincing friends, relatives, and employers to also do the same would drop the computational requirement further.

chemical time
in nature
 
simulation time
one
computer
100,000
computers
1 million
computers
1 S  (1.0      seconds) 1,000,000,000 days 2,737.925 years 27 years 2.7 years
1 mS (0.001    seconds) 1,000,000     days 2,737 years 10 days 1 day 
1 uS (0.000001 seconds) 1,000         days 2.73 years 14.4 mins 1.44 mins

Additional information for techies, hackers and science buffs:

Too Much Information? Click here to skip past this section.

  1. Special-purpose research computers like IBM's BlueGene and Roadrunner employ 10,000 to 20,000 processors (CPUs) joined by many kilometers of optical fiber to do something similar in one location. (caveat: Roadrunner is a hybrid technology employing CPUs and special non-graphic GPUs called cell processors)
     
  2. As of May 2014, the Folding@home project consists of 244,000 active processors (some CPUs, some GPUs) which is equivalent to 3 PetaFLOPS. This means that the million-day protein simulation problem could theoretically be completed in (1,000,000/244,000) 4.1 days. But since there are many more protein molecules than DNA molecules, humanity could be at this for quite some time to come. Adding your computers to Folding@home will permanently advance mankind's progress in protein research.

    Personal Comments:

    There are almost 1 billion accounts registered on Facebook. Even if some of these entries represent organizations, I am shocked that there are less than one million "active processors" at folding-at-home. We all know there are other distributed computing projects on the internet but "less than one million protein-folders" seems a crime against humanity.

     
  3. When the Human Genome Project (to study human DNA) was being planned, it was thought that the task may require 100 years. However, technological change in the area of computers, robotic sequencers, and use of the internet to coordinate the activities of a large number of universities (each assigned a small piece of the problem), allowed the human genome project to publish results after only 15 years. 

  4. Distributed computing projects like Folding@home and BOINC have only been possible since 1995 when the world-wide-web (which was first proposed in 1989 to solve a document sharing problem at CERN) began to make the internet popular and ubiquitous.
     
  5. Traditionally, processor technology was defined like this:
    Then CISC and RISC vendors began to add vector processing instructions to their processor chips which blurred everything
    1. Minicomputer / Workstation
    2. Microcomputer / Desktop
      1. 1997: MMX was implemented on P55C (a.k.a. Pentium 1) from Intel
        • the first offering introduced 57 MMX-specific instructions
      2. 1998: 3DNow! was implemented on AMD K-2
      3. 1999: AltiVec (also called "VMX" by IBM and "Velocity Engine" by Apple) was implemented on PowerPC 4 from Motorola
      4. 1999: SSE (Streaming SIMD Extensions) was implemented on Pentium 3 "Katmai" from Intel.
        1. this technology employs 128-bit instructions
        2. SSE was Intel's reply to AMD's 3DNow!
        3. SSE replaces MMX (both are SIMD but SSE uses its own floating point registers)
      5. 2001: SSE2 was implemented on Pentium 4 from Intel
      6. 2004: SSE3 was implemented on Pentium 4 Prescott on from Intel
      7. 2006: SSE4 was implemented on Intel Core and AMD K10
      8. 2008: AVX (Advanced Vector Instructions) proposed by Intel + AMD but not seen until 2011
        1. this technology employs 256-bit instructions
           
    But GPU (graphics programming units) take vector processing to a whole new level. Why? A $150.00 graphics card can now equip your system with 1500-2000 streaming processors and 2-4 GB of additional high speed memory.
    I've been in the computer hardware-software business for a while now but can confirm that computers have only started to get real interesting again this side of 2007 with the releases of CUDA, OpenCL, etc.
  6. Distributed computing projects like Folding@home and BOINC have only been practical since 2005 when the CPUs in personal computers began to out-perform mini-computers and enterprise servers. This was partly because...
    1. AMD added 64-bit support to their x86 processor technology calling it x86-64.
    2. Intel followed suit calling their 64-bit extension technology EM64T
    3. DDR2 (fast) memory became popular
    4. Intel added DDR2 support to their Pentium 4 processor line
    5. AMD added DDR2 support to their Athlon 64 processor line
    6. Both AMD and Intel began supporting DDR3 memory
       
  7. Since then, the following list of technological improvements has only made computers both faster and cheaper:
    1. multi-core (each core is a fully functional CPU) chips from all manufacturers
    2. shifting analysis from each CPU core into multiple (hundreds to thousands) streaming processors found in high-end graphics cards
      1. ATI (now AMD) Radeon graphics cards
      2. NVidia GeForce graphics cards
      3. development of high performance "graphics" memory technology (e.g. GDDR3 , GDDR4 , GDDR5) to bypass processing stalls caused when processors are too fast. Note that GDDR5 will represent main memory in the not-yet-release PlayStation 4 (PS4)
    3. Intel's abandonment of NetBurst which meant a return to shorter instruction pipelines starting with Core2
      Comment: AMD never went to longer pipelines; a long pipeline is only efficient when running a static CPU benchmark for marketing purposes - not running code in real-world where i/o events interrupt the primary foreground task (science in our case)
    4. introduction of DDR3 memory
    5. Intel replacing 20-year old FSB technology with a new approach called QuickPath Interconnect (QPI). See their Core i7 chips.
      Historical note:
      1. this technology was invented by DEC for their Alpha CPUs and named CSI (Common System Interconnect).
      2. Compaq bought DEC in 1998.
      3. The Alpha Engineering team and Alpha Technology was sold to Intel in 2001 during merger discussions between HP and Compaq.
      4. The merger was completed in 2002.
    6. The AMD equivalent of QPI is called HyperTransport which has been described as a multipath Ethernet targeted for use within a computer system.
       
  8. As is true in any "demand vs. supply" scenario, most consumers didn't need the additional computing power which meant that chip manufacturers had to drop their prices just to keep the computing marketplace moving. This was good news for people setting up "folding farms". Something similar is happening today with computer systems since John-q-public is shifting from "towers and desktops" to "laptops and pads". This is causing the price of towers and graphics cards to plummet ever lower. You just can't beat the price-performance ratio of an Core-i7 motherboard hosting an NVidia graphics card. (prediction: laptops and pads will never ever be able to fold as well as a tower; towers will always be around in some form; low form-factor desktops might become extinct)
     
  9. Shifting from brute-force "Chemical Equilibrium" algorithms to techniques involving Bayesian statistics and Markov Models will enable some exponential speedups.
     
  10. water molecules
    Liquid Water
    This diagram depicts an
    H2O molecule loosely
    connected to four others
    Computational Chemistry

    Question: After perusing the periodic table for a moment you will soon realize that the molecular mass of water (H2O) is ~18 while the molecular mass of oxygen (O2) is ~32, carbon dioxide (CO2) is ~44 and ozone (O3) is ~48. So why is H20 in a liquid state at room temperature while other slightly heavier molecules take the form of a gas?
     
    Substance Molecule Atomic
    Masses
    Molecular
    Mass
    State at Room
    Temperature
    Water H2O (1x2)+16 18 liquid
    Molecular Oxygen O2 (16x2) 32 gas
    Carbon Dioxide CO2 12+(16x2) 44 gas
    Ozone O3 (16x3) 48 gas
    Methane CH4 12+(1x4) 16 gas
    Ethane C2H6 (12x2)+(1x6) 30 gas
    Propane C3H8 (12x3)+(1x8) 44 gas
    Butane C4H10 (12x4)+(1x10) 58 gas
    Pentane C5H12 (12x5)+(1x12) 72 gas
    Hexane C6H14 (12x6)+(1x14) 86 liquid
    Heptane C7H16 (12x7)+(1x16) 100 liquid
    Octane C8H18 (12x8)+(1x18) 114 liquid

    Short answer: In the case of an H20 molecule, even though two hydrogen atoms are electrically bound to one oxygen atom, the same hydrogen atoms are also attracted to each other and this causes the water molecule to bend into a Y shape. At the mid-point of the bend, an electrical charge from the oxygen atom is exposed to the world which allows a weak connection to the hydrogen atom of a neighboring H20 molecule (water molecules weakly sticking to each other form a liquid). These weak connections are called Van der Waals forces
     
    Van der Waals did all his computations with pencil and paper long before the computer was invented but it was only possible because the molecules involved were small.
     
    Chemistry Caveat: The compound table above was only meant to get you thinking because Molecular Mass is not all there is to the picture. Getting back to the periodic table for a moment will show:
     
    1. all elements in column 1 (except hydrogen) are solid at room temperature
       
    2. all elements in column 8 (helium to radon) are gaseous at room temperature
       
    3. Half the elements in row 2 starting with Lithium (atomic number 3) and ending with Carbon (atomic number 6),
      as well as two thirds of row 3 starting with Sodium (atomic number 11) and ending with Sulphur (atomic number 16),
      are solid at room temperature.
       
    I will leave it to you to determine why.

    hint:
    the answer also involves the repulsive force between electrons as well as the attractive force between electrons and protons. smile
     
  11. Molecular Dynamics

    Proteins come in many shapes and sizes. Here is a very short list:

    Protein ~ Mass Function Notes
    Chlorophyll a 893 facilitates photosynthesis in plants  
    Heme A 852 common ligand for many hemeproteins including hemoglobin and myoglobin  
    Alpha-amylase 56,031 salivary enzyme to digest starch pdbId=1SMD
    hemoglobin 64,458 red blood cell protein  
    DNA polymerase varies from
    50k to 200k
    enzyme responsible of DNA replication  

    These molecules are so large that modeling the intra and interactions can only be done accurately with a computer.

(mostly) Stanford School of Medicine - Links

Folding@home - Stanford School of Medicine

http://folding.stanford.edu main home page including software download tool 95% of people need only click here
http://folding.stanford.edu/home/
http://folding.stanford.edu/home/blog
  4% will want to see this
blog of Vijay Pande
http://foldingforum.org problem discussions, news, science, etc. 1% will want to see this

Extra Stuff

(Stanford's) Targeted Diseases

This "folding knowledge" will be used to develop new drugs for treating diseases such as:

Reference Links: Folding@home - FAQ Diseases

More Information About Proteins and Protein-Folding Science

Protein Videos

Online Documents

My Computational Statistics

Stream Computing via graphics cards

Executive Summary: while a single core Pentium-class CPU provides also provides one streaming (vector) processor under marketing names like MMX and SSE, one graphics card can provide hundreds to thousands.

Stream Computing

Scalar vs. Vector

  1. CPUs (central processing units) are scalar processors which execute instructions sequentially
    • RISC processors can exploit certain kinds of instruction-level parallelism. In some cases they can execute instructions out-of-order.
    • Modern processors (CISC and RISC) also support SIMD (single instruction - multiple data) technology for certain applications involving DSP (digital signal processing) or multi-media.
    • In the Intel world, SIMD technology goes by the name MMX/SSE/SSE2, etc.
  2. GPUs (graphics programming units) are vector processors which easily execute parallel operations
    • AMD/ATI cards typically support anywhere between 800 and 200 streaming processors (typically labeled "unified shaders")
    • NVidia cards typically support fewer streaming processors but seem to be able to utilize them more efficiently
    • Since graphics cards have their own large memory systems, they should be thought of as a private computer system within your computer. Remember that this private computer is not going to be continually trounced by real-world interrupts, etc.

Using your NVidia graphics card to do protein-folding science

My Personal Experience Doing GPU-based Science:
  1. I run a mixture of systems employing graphics cards from both AMD/ATI and NVidia.
  2. Most of my systems employ the HD-6670 from AMD/ATI.
  3. Two of my systems employ the GTX-560 from NVidia (I was forced to buy these cards when AMD/ATI removed OpenCL support from their Windows-XP device driver in the Spring of 2012)
  4. When purchased new, the price of GTX-560 is approximately twice that of the HD-6670 but seems to be doing 9-10 times more science even though my NVidia cards are installed on older hardware platforms running older operating systems.
  5. If you are are setting up protein folding systems for charitable humanitarian purposes then considering spending the extra money to buy an NVidia GTX-560
  6. In 2013 the GTX-560 product has been replaced with GTX-660
    1. Make sure you buy something with a GTX prefix rather than a GE prefix
    2. Make sure the last two digits are 60, 70, 80 or 90.
  7. In 2014 you want to buy a GTX-760

Table: GTX-6xx Technical Comparisons

Using your ATI graphics card to do protein-folding science

Historical chart showing the rapid evolution of video card technology (2006-2012) 

Card MFR Shaders
(CPUs)
Shader Notes GPU
Technology
GPU
Chip
Year Notes
Radeon x1950 Pro AMD/ATI 8 : 36 Vertex : Pixel GPU1 RV570 2006 36 pixel shaders
Radeon x1950 XTX AMD/ATI 8 : 48 Vertex : Pixel GPU1 R580 2007 48 pixel shaders
Radeon HD3870 AMD/ATI 320 unified (programmable) GPU2 RV670 2007 unified (programmable) shaders
Radeon  HD4870 AMD/ATI 800 unified (programmable) GPU2 RV770 2008 unified (programmable) shaders
800/36 = 22 fold increase in only two years
(growth is currently exceeding Moore's law)
Radeon HD5970 AMD/ATI 1600 unified (programmable) GPU3 RV870 2009 synchronized with delayed release of Windows 7
1600/800 = 2.0 fold increase in one year
(growth is currently exceeding Moore's law)
Radeon HD6870 AMD/ATI 1536 unified (programmable) GPU3 Cayman XT 2010  
Radeon HD7970 AMD/ATI 2048 unified (programmable) GPU3 Tahiti XT 2012  

GeForce GTX 560 NVidia 288:48:24 Vertex:Geometry:Pixel GPU3 GF114 2011 This graphics card is the most powerful folder in
my collection but look at the relatively (compared
to AMD/ATI) low number of shaders. The speed is
due to architectural differences.

ATI-GPU Caveats:

AMD/ATI GPU2 Graphics Card Caveat (2012)

Time and technology never stand still and the same is true for graphics cards. You can imagine the difficulty science organizations experience while trying to keep up with the constant introduction of new products from hardware manufacturers. For the past half-decade the computer industry as been working on heterogeneous technologies (OpenCL, CUDA, PhysX, DirectCompute, etc.) for doing science on graphics cards. Stanford Folding Software requires OpenCL (Open Computing Language) which must not be confused with OpenGL (Open Graphics Library).

Announcement: Stanford to drop GPU2 cards made by AMD/ATI

GPU folding on Windows-XP is no longer supported by AMD/ATI

Microsoft Windows Scripting and Programming

  1. MS-DOS/MSDOS Batch Files: Batch File Tutorial and Reference
  2. MS-DOS @wikipedia
  3. Batch file @wikipedia
  4. Microsoft Windows XP - Batch files

Experimental Stuff for Windows Hackers and Gurus

Here are some DOS commands for creating, and starting, a Windows Service to execute a DOS script.

sc create neil369 binpath= "cmd /k start c:\folding-0\neil987.bat" type= own type= interact
sc start  neil369

Once created, you can stop/start/modify a service graphically from this Windows location:

                Start >>  Programs >> Administrative Tools >> Services

Stopped services may only be deleted from DOS like so:

sc query  neil369
sc delete neil369

BOINC

BOINC (Berkeley Open Infrastructure for Network Computing)

Compbined BOINC Stats: Neil Steven Rieck BOINC (Berkeley Open Infrastructure for Network Computing) is a science framework in which you can support one, or more, projects of choice.

Protein / Biology / Medicine Projects

POEM@home (via BOINC)

Rosetta@home (via BOINC)

World Community Grid (via BOINC)

Personal Comment: I wonder why HP (Hewlett-Packard) has not followed IBM's lead. Up until now I always thought of IBM as the template of uber-capitalism but it seems that the title of "king of profit by the elimination of seemingly superfluous expenses" goes to HP. Don't they realize that IBM's effort in this area is done under IBM's advertising budgets? Just like IBM's 1990s foray into chess playing systems (e.g. Deep Blue) led to increased sales as well as share prices, one day IBM will be able to say "IBM contributed to a treatments for human diseases including cancer". IBM actions in this area reinforce the public's association with IBM and information processing.

Biology Science Links

Protein Data Bank Links

ENCODE

The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.

Local Links

(noteworthy) Remote Links

Recommended Biology Books (I own them all)


Back to Home
Neil Rieck
Kitchener - Waterloo - Cambridge, Ontario, Canada.