Folding@home + BOINC: Tips and Advocacy

Click here for a low-tech version: Guaranteed Human Life Extension
In 1597, Francis Bacon said knowledge is power but this may be the first time that power is knowledge

The Folding@home article at Wikipedia is more informative than this personal effort.
 

Recent News
  1. I started this project on 2007-09-30 with ~ 10 computers
  2. On 2008-06-14 (258 days later) I reached one million (1,000,000) units
  3. I began purchasing used Graphics Cards from eBay to speed up calculations
  4. The second million only required 114 days
  5. On 2016-09-25 I passed one hundred million (100,000,000) units
    (~ 5 computers; doing 1 million every 20 days)

Dr. Isaac Asimov

Dr. Isaac Asimov

Folding@home is biological research based upon the science of Molecular Dynamics where molecular chemistry and mathematics are combined in computer-based models to predict how protein molecules might fold in three spatial dimensions over time.


When I first heard about this, I recalled Isaac Asimov's sci-fi magnum opus colloquially known as The Foundation Trilogy which introduced the fictional branch of science called psychohistory where statistics, history and sociology are combined in computer-based models to predict humanity's future.
 
Years ago I became infected with an Asimov inspired optimism about humanity's future and have since felt the need to contribute to it. While Folding@home will not cure my "infection of optimism", I am convinced Dr. Asimov (who received a Ph.D. in Biochemistry from Columbia in 1948 then was employed as a Professor of Biochemistry at the Boston School of Medicine until 1958 when his writing workload became too large) would have been fascinated by something like this.

Dr. Asimov, I'm computing these protein folding sequences in memory of you, and your work.

I was considering a financial charitable donation to Folding@home when it occurred to me that my money would be better spent by:
  1. Making a knowledgeable charitable donation to all of humanity by increasing my Folding@home computations (which will advance medical discoveries along with associated pharmaceutical treatments thus lengthening human life). I was already folding on a half-dozen computers anyway so all I needed to do was purchase used video cards on eBay.
     
  2. Convincing others (like you) to follow my example. My solitary folding efforts will have little effect on humanity's future. Together we can make a real difference.

Quick-Navigation Menu

Protein Folding Overview

Science Problem

Misfolded proteins have been associated with numerous diseases and age-related illnesses. However, proteins are so much more larger and complicated than smaller molecules that it is not possible to begin a chemical experiment without first providing hints to researchers about where to look and what to look for. Since the behavior of atoms-in-molecules (Computational Chemistry)  as well as atoms-between-molecules (Molecular Dynamics) can be modeled, it makes more sense to begin with a computer analysis. Then permitted configurations can then be passed on to experimental researchers.

Real-world observation

Cooking an egg causes the clear protein (albumen) to unfold into long strings which now can intertwine into a tangled network which will stiffen then scatter light (appear white). No chemical change has occurred but taste, volume and color have been altered.

Click here to read a short "protein article" by Isaac Asimov published in 1993 shortly after his death.

Computer Solution

Single CPU Systems

Using the most powerful single core processor (CPU) available today, simulating the folding possibilities of one large protein molecule for one millisecond of chemical time might require one million days (2737 years) of computational time. However (and this is where you come in), if the problem is sliced up then assigned to 100,000 personal computers over the internet, the computational requirement would drop to ten days. Convincing friends, relatives, and employers to also do the same would drop the computational requirement further.

chemical time
in nature
 
simulation time
one
computer
100,000
computers
1 million
computers
1 S  (1.0      seconds) 1,000,000,000 days 2,737.925 years 27 years 2.7 years
1 mS (0.001    seconds) 1,000,000     days 2,737 years 10 days 1 day 
1 uS (0.000001 seconds) 1,000         days 2.73 years 14.4 mins 1.44 mins

Additional information for techies, hackers and science buffs
(click here to skip past this section.)

  1. Special-purpose research computers like IBM's BlueGene and Roadrunner employ 10,000 to 20,000 processors (CPUs) joined by many kilometers of optical fiber to do something similar in one location. (caveat: Roadrunner is a hybrid technology employing CPUs and special non-graphic GPUs called cell processors)
     
  2. As of May 2016, the Folding@home project consists of 102,733 active computers (some CPUs, some GPUs) which is equivalent to 97,174 TeraFLOPS (~ 100 PetaFLOPS). This means that the million-day protein simulation problem could theoretically be completed in (1,000,000/163,395) 6.1 days but since there are many more protein molecules than DNA molecules, humanity could be at this for quite some time. Adding your computers to Folding@home will permanently advance mankind's progress in protein research.

    These numbers used to be much higher before a large fraction of society shifted from PCs to tablets. Around the same time, lower end PCs included embedded graphic chips while dropping expansion slots. Does this mean that only higher power PCs and gaming rigs will be contributing to distributed computing projects? Perhaps.
     
  3. When the Human Genome Project (to study human DNA) was being planned, it was thought that the task may require 100 years. However, technological change in the area of computers, robotic sequencers, and use of the internet to coordinate the activities of a large number of universities (each assigned a small piece of the problem), allowed the human genome project to publish results after only 15 years. A 600% increase in speed.

  4. Distributed computing projects like Folding@home and BOINC have only been possible since 1995 when the world-wide-web (which was first proposed in 1989 to solve a document sharing problem amongst scientists at CERN in Geneva) began to make the internet both popular and ubiquitous.
     
  5. Processor technology was traditionally defined like this:
    Then CISC and RISC vendors began to add vector processing instructions to their processor chips which blurred everything
    1. Minicomputer / Workstation
      1. 1989: DEC adds vector processing capabilities to their Rigel microprocessor
      2. 1989: DEC adds optional vector processing to VAX-6000 model 400 (called VAXvector)
      3. 1994: VIS 1 (Visual Instruction Set) was introduced into UltraSPARC processors by SUN
      4. 1996: MDMX (MIPS Digital Media eXtension) is released by MIPS
      5. 1997: MVI (Motion Video Extension) was implemented on Alpha 21164PC from DEC/Compaq. MVI appears again in Alpha 21264 and Alpha 21364.
    2. Microcomputer / Desktop
      1. 1997: MMX was implemented on P55C (a.k.a. Pentium 1) from Intel
        • the first offering introduced 57 MMX-specific instructions
      2. 1998: 3DNow! was implemented on AMD K-2
      3. 1999: AltiVec (also called "VMX" by IBM and "Velocity Engine" by Apple) was implemented on PowerPC 4 from Motorola
      4. 1999: SSE (Streaming SIMD Extensions) was implemented on Pentium 3 "Katmai" from Intel.
        1. this technology employs 128-bit instructions
        2. SSE was Intel's reply to AMD's 3DNow!
        3. SSE replaces MMX (both are SIMD but SSE uses its own floating point registers)
      5. 2001: SSE2 was implemented on Pentium 4 from Intel
      6. 2004: SSE3 was implemented on Pentium 4 Prescott on from Intel
      7. 2006: SSE4 was implemented on Intel Core and AMD K10
      8. 2008: AVX (Advanced Vector Instructions) proposed by Intel + AMD but not seen until 2011
        1. many components extended to 256-bits
      9. 2012: AVX2 (more components extended to 256-bits)
      10. 2015: AVX-512 (512-bit extensions)
         
    But GPU (graphics programming units) take vector processing to a whole new level. Why? A $150.00 graphics card can now equip your system with 1500-2000 streaming processors and 2-4 GB of additional high speed memory.
    I've been in the computer hardware-software business for a while now but can confirm that computers have only started to get real interesting again this side of 2007 with the releases of CUDA, OpenCL, etc.
  6. Distributed computing projects like Folding@home and BOINC have only been practical since 2005 when the CPUs in personal computers began to out-perform mini-computers and enterprise servers. This was partly because...
    1. AMD added 64-bit support to their x86 processor technology calling it x86-64.
    2. Intel followed suit calling their 64-bit extension technology EM64T
    3. DDR2 (fast) memory became popular
    4. Intel added DDR2 support to their Pentium 4 processor line
    5. AMD added DDR2 support to their Athlon 64 processor line
    6. Both AMD and Intel began supporting DDR3 memory
       
  7. Since then, the following list of technological improvements has only made computers both faster and cheaper:
    1. multi-core (each core is a fully functional CPU) chips from all manufacturers
    2. shifting analysis from each CPU core into multiple (hundreds to thousands) streaming processors found in high-end graphics cards
      1. ATI (now AMD) Radeon graphics cards
      2. NVidia GeForce graphics cards
      3. development of high performance "graphics" memory technology (e.g. GDDR3 , GDDR4 , GDDR5) to bypass processing stalls caused when processors are too fast. Note that GDDR5 will represent main memory in the not-yet-release PlayStation 4 (PS4)
    3. Intel's abandonment of NetBurst which meant a return to shorter instruction pipelines starting with Core2
      Comment: AMD never went to longer pipelines; a long pipeline is only efficient when running a static CPU benchmark for marketing purposes - not running code in real-world where i/o events interrupt the primary foreground task (science in our case)
    4. introduction of DDR3 memory
    5. Intel replacing 20-year old FSB technology with a proprietary new approach called QuickPath Interconnect (QPI) which is now found in Core-i3, Core-i5, Core i7 and Xeon
      Historical note:
      1. DEC created the 64-bit Alpha processor which was first announced in 1992 (21064 was first, 21164, 21264, 21364, came later)
      2. Compaq bought DEC in 1998
      3. The DEC division of Compaq created CSI (Common System Interface) for use in their EV8 Alpha processor which was never released
      4. HP merged with Compaq in 2002
      5. HP preferred Itanium2 (jointly developed by HP and Intel) so announced their intention to gracefully shut down Alpha (it would take more than a year to boot OpenVMS on Itanium2 and another year for big-system qualification tests)
      6. Alpha technology (which included CSI) was immediately sold to Intel
      7. approximately 300 Alpha engineers were transferred to Intel between 2002 and 2004
      8. CSI morphed into QPI (some industry watchers say that Intel ignored CSI until the announcement by AMD to go with the industry-supported technology known as HyperTransport
      9. QPI video: http://www.intel.com/content/www/us/en/performance/performance-quickpath-architecture-demo.html <--- nerd alert! this video is way too cool!
    6. The remainder of the industry went with a non-proprietary technology called HyperTransport which has been described as a multipoint Ethernet for use within a computer system.
       
  8. As is true in any "demand vs. supply" scenario, most consumers didn't need the additional computing power which meant that chip manufacturers had to drop their prices just to keep the computing marketplace moving. This was good news for people setting up "folding farms". Something similar is happening today with computer systems since John-q-public is shifting from "towers and desktops" to "laptops and pads". This is causing the price of towers and graphics cards to plummet ever lower. You just can't beat the price-performance ratio of an Core-i7 motherboard hosting an NVidia graphics card. (prediction: laptops and pads will never ever be able to fold as well as a tower; towers will always be around in some form; low form-factor desktops might become extinct)
     
  9. Shifting from brute-force "Chemical Equilibrium" algorithms to techniques involving Bayesian statistics and Markov Models will enable some exponential speedups.
     
  10. water molecules
    Liquid Water
    This diagram depicts an
    H2O molecule loosely
    connected to four others
    Computational Chemistry

    Question: After perusing the periodic table of the elements for a moment you will soon realize that the molecular mass of water (H2O) is ~18 while the molecular mass of oxygen (O2) is ~32, carbon dioxide (CO2) is ~44 and ozone (O3) is ~48. So why is H20 in a liquid state at room temperature while other slightly heavier molecules take the form of a gas?
     
    Substance Molecule Atomic
    Masses
    Molecular
    Mass
    State at Room
    Temperature
    Water H2O (1x2)+16 18 liquid
    Molecular Oxygen O2 (16x2) 32 gas
    Carbon Dioxide CO2 12+(16x2) 44 gas
    Ozone O3 (16x3) 48 gas
    Methane CH4 12+(1x4) 16 gas
    Ethane C2H6 (12x2)+(1x6) 30 gas
    Propane C3H8 (12x3)+(1x8) 44 gas
    Butane C4H10 (12x4)+(1x10) 58 gas
    Pentane C5H12 (12x5)+(1x12) 72 gas
    Hexane C6H14 (12x6)+(1x14) 86 liquid
    Heptane C7H16 (12x7)+(1x16) 100 liquid
    Octane C8H18 (12x8)+(1x18) 114 liquid

    Short answer: In the case of an H20 molecule, even though two hydrogen atoms are electrically bound to one oxygen atom, the same hydrogen atoms are also attracted to each other and this causes the water molecule to bend into a Y shape. At the mid-point of the bend, an electrical charge from the oxygen atom is exposed to the world which allows a weak connection to the hydrogen atom of a neighboring H20 molecule (water molecules weakly sticking to each other form a liquid). These weak connections are called Van der Waals forces
     
    Van der Waals did all his computations with pencil and paper long before the computer was invented and this was only possible because the molecules in question were small and few.
     
    Chemistry Caveat: The Molecular Table above was only meant to get you thinking. Now inspect this LARGER periodic table of the elements where the color of the atomic number indicates whether solid or gaseous:
     
    1. all elements in column 1 (except hydrogen) are naturally solid
       
    2. all elements in column 8 (helium to radon) are naturally gaseous
       
    3. half the elements in row 2 starting with Lithium (atomic number 3) and ending with Carbon (atomic number 6),
      as well as two thirds of row 3 starting with Sodium (atomic number 11) and ending with Sulphur (atomic number 16),
      are naturally solid
       
    I will leave it to you to determine why
     
  11. Molecular Dynamics

    Proteins come in many shapes and sizes. Here is a very short list:

    Protein ~ Mass Function Notes
    Chlorophyll a 893 facilitates photosynthesis in plants  
    Heme A 852 common ligand for many hemeproteins including hemoglobin and myoglobin  
    Alpha-amylase 56,031 salivary enzyme to digest starch pdbId=1SMD
    hemoglobin 64,458 red blood cell protein  
    DNA polymerase varies from
    50k to 200k
    enzyme responsible of DNA replication  

    These molecules are so large that modeling the intractions and interactions can only be done accurately with a computer

(mostly) Stanford School of Medicine - Links

Folding@home - Stanford School of Medicine

http://folding.stanford.edu main home page including software download tool
http://folding.stanford.edu/home/blog blog of Vijay Pande
http://foldingforum.org problem discussions, news, science, etc.

Extra Stuff

(Stanford's) Targeted Diseases

This "folding knowledge" will be used to develop new drugs for treating diseases such as:

Reference Links: Folding@home - FAQ Diseases

More Information About Proteins and Protein-Folding Science

Protein Videos

Online Documents

My Computational Statistics

Stream Computing via graphics cards

Executive Summary: while a single core Pentium-class CPU provides also provides one streaming (vector) processor under marketing names like MMX and SSE, one graphics card can provide hundreds to thousands.

Stream Computing

Scalar vs. Vector

  1. CPUs (central processing units) are scalar processors which execute instructions sequentially
    • RISC processors can exploit certain kinds of instruction-level parallelism. In some cases they can execute instructions out-of-order.
    • Modern processors (CISC and RISC) also support SIMD (single instruction - multiple data) technology for certain applications involving DSP (digital signal processing) or multi-media.
    • In the Intel world, SIMD technology goes by the name MMX/SSE/SSE2, etc.
  2. GPUs (graphics programming units) are vector processors which easily execute parallel operations
    • AMD/ATI cards typically support anywhere between 800 and 200 streaming processors (typically labeled "unified shaders")
    • NVidia cards typically support fewer streaming processors but seem to be able to utilize them more efficiently
    • Since graphics cards have their own large memory systems, they should be thought of as a private computer system within your computer.

Historical chart showing the rapid evolution of video card technology (2006-2012)  

Card MFR Shaders
(CPUs)
Shader Notes GPU
Technology
GPU
Chip
Year Notes
Radeon x1950 Pro ATI 8 : 36 Vertex : Pixel GPU1 RV570 2006 36 pixel shaders
Radeon x1950 XTX ATI 8 : 48 Vertex : Pixel GPU1 R580 2007 48 pixel shaders
Radeon HD3870 ATI 320 unified (programmable) GPU2 RV670 2007 unified (programmable) shaders
Radeon HD4870 ATI 800 unified (programmable) GPU2 RV770 2008 unified (programmable) shaders
800/36 = 22 fold increase in only two years
(growth is currently exceeding Moore's law)
Radeon HD5970 ATI 1600 unified (programmable) GPU3 RV870 2009 synchronized with delayed release of Windows 7
1600/800 = 2.0 fold increase in one year
(growth is currently exceeding Moore's law)
Radeon HD6870 AMD 1536 unified (programmable) GPU3 Cayman XT 2010  
GeForce GTX 560 NVidia 288:48:24 Vertex:Geometry:Pixel GPU3 GF114 2011 This is the most powerful folder in my collection. Notice
the lower (compared to AMD/ATI) number of shaders.
Performance is due to due to architectural differences
Radeon HD7970 AMD 2048 unified (programmable) GPU3 Tahiti XT 2012  

Note: AMD acquired ATI in 2006 but continued to use the ATI name into 2009

Using your NVidia graphics card to do protein-folding science

My Personal Experience Doing GPU-based Science:
  1. I now run a mixture of systems employing graphics cards from both AMD and NVidia.
  2. Most of my systems employ the HD-6670 from AMD
  3. Two of my systems employ the GTX-560 from NVidia (caveat: see note #6)
  4. When purchased new, the price of GTX-560 is approximately twice that of the HD-6670 but seems to be doing 9-10 times more science even though my NVidia cards are installed on older hardware platforms running older operating systems. Now to be fair, when you compare mid-range priced cards between these two companies it would seem that NVidia cards are only doing twice as much science.
  5. It appears that the best bang-for-the buck always comes from a card whose model number ends in 60 and has a prefix of GTX rather than GE
  6. In 2016 many resource contributors were unable to get work units for GTX-560 on 32-bit versions of Windows-XP (I thought the GPU did all the work). Here is what Stanford published on 2016-07-03:
    FAH tends to push the limits of science and that means that some things can no longer be done with Windows-XP or with 32-bit CPUs. At some point all new projects will require 64-bit and all new projects will require Windows7 or above. The studies of "easy" proteins have been or soon will be completed. I can't predict when that will happen and I doubt anybody else can.
    So it probably makes little sense to continue working with 32-bit OSs. If your hardware is 64-bit capable you might wish to shift to a 64-bit version of Linux
  7. As of 2016 I now recommend the GTX-960

Using your AMD graphics card to do protein-folding science

Note: AMD acquired ATI in 2006 but continued to use the ATI name into 2009

AMD fubars in 2012

Time and technology never stand still and the same is true for graphics cards. You can imagine the difficulty researchers experience while attempting to keep up with the continual introduction of new products from hardware manufacturers. For the past half-decade the computer industry as been working on heterogeneous technologies (OpenCL, CUDA, PhysX, DirectCompute, etc.) for doing science on graphics cards. Stanford Folding Software requires OpenCL (Open Computing Language) which should not be confused with OpenGL (Open Graphics Library).

Announcement: Stanford to drop GPU2 cards made by AMD

GPU folding on Windows-XP is no longer supported by AMD

Microsoft Windows Scripting and Programming

  1. MS-DOS/MSDOS Batch Files: Batch File Tutorial and Reference
  2. MS-DOS @wikipedia
  3. Batch file @wikipedia
  4. Microsoft Windows XP - Batch files

Experimental Stuff for Windows Hackers and Gurus

Here are some DOS commands for creating, and starting, a Windows Service to execute a DOS script.

sc create neil369 binpath= "cmd /k start c:\folding-0\neil987.bat" type= own type= interact
sc start  neil369

Once created, you can stop/start/modify a service graphically from this Windows location:

                Start >>  Programs >> Administrative Tools >> Services

Stopped services may only be deleted from DOS like so:

sc query  neil369
sc delete neil369

BOINC

BOINC (Berkeley Open Infrastructure for Network Computing)

Compbined BOINC Stats: Neil Steven Rieck BOINC (Berkeley Open Infrastructure for Network Computing) is a science framework in which you can support one, or more, projects of choice.

Protein / Biology / Medicine Projects

POEM@home (via BOINC)

Rosetta@home (via BOINC)

World Community Grid (via BOINC)

Personal Comment: I wonder why HP (Hewlett-Packard) has not followed IBM's lead. Up until now I always thought of IBM as the template of uber-capitalism but it seems that the title of "king of profit by the elimination of seemingly superfluous expenses" goes to HP. Don't they realize that IBM's effort in this area is done under IBM's advertising budgets? Just like IBM's 1990s foray into chess playing systems (e.g. Deep Blue) led to increased sales as well as share prices, one day IBM will be able to say "IBM contributed to a treatments for human diseases including cancer". IBM actions in this area reinforce the public's association with IBM and information processing.

Biology Science Links

Protein Data Bank Links

ENCODE

The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.

Local Links

(noteworthy) Remote Links

Recommended Biology Books (I own them all)


Back to Home
Neil Rieck
Kitchener - Waterloo - Cambridge, Ontario, Canada.