EFFECTIVE INFORMATION VISUALIZATION
Guidelines and Metrics for
3D Interactive Representations of Business Data

 

4 GUIDELINES + METRICS (from the case studies) *

4.1 Assumptions *

4.1.1 Information Visualization focus *

4.1.2 Data *

4.1.3 Hardware *

4.1.4 Communication *

4.1.5 Guidelines Are Not Rules *

4.1.6 Metric Assumptions *

4.2 Task Knowledge And Data Structures *

4.2.1 Goals *

4.2.2 Understanding the Task *

4.2.2.1 Decision Requirements *

4.2.2.2 Communication Requirements *

4.2.2.3 Information Requirements *

4.2.2.4 Workflow Requirements *

4.2.2.5 Gather Artifacts *

4.2.3 Data *

4.2.3.1 Data Sources *

4.2.3.2 Primary Data *

4.2.3.3 Supplementary Data *

4.2.3.4 Data Calculations *

4.2.3.5 Special Cases *

4.2.4 Characterize and Rank Data *

4.2.4.1 Determine Data Types (Type of Dimension) *

4.2.4.2 Multi-Dimensional analysis *

4.2.4.3 Rank the dimensions *

4.2.4.4 Organize the data into structures *

4.2.5 Task and Data Conclusions *

4.3 Visualization *

4.3.1 Use an organizational device the user already knows *

4.3.1.1 LATCH (Wurman) *

4.3.1.2 Holmes *

4.3.1.3 Axes *

4.3.1.4 Guideline: Use overall organizational devices to structure a visualization. *

4.3.2 Small Multiples (Glyphs) *

4.3.2.1 Wind Example *

4.3.2.2 Family Example *

4.3.2.3 Small Multiples (Tufte) *

4.3.2.4 Trellising (Cleveland) *

4.3.2.5 Worlds-Within-Worlds (Feiner) *

4.3.2.6 Preattentive Perception (Treisman & Healy) *

4.3.2.7 Origins of Written Language (Schmandt-Besserat) *

4.3.2.8 Visible Decisions Experience *

4.3.2.9 Guideline: Use small multiples to visually convey multivariate data *

4.3.3 Reference context *

4.3.3.1 Reference techniques *

4.3.3.2 Guideline: Always provide a reference context *

4.3.3.3 Reference Context and Percentage of Identifiable Points *

4.3.4 Number of Data Points and Data Density *

4.3.4.1 Bounds *

4.3.5 Visualization complexity *

4.3.5.1 Guideline: Use redundancy to aid discrimination and comprehension *

4.3.5.2 Guideline: Use different visual dimensions differently *

4.3.5.3 Guideline: Use connotative mappings *

4.3.5.4 Measuring Complexity *

4.3.6 Occlusion *

4.3.7 Legend, scale and annotation *

4.3.7.1 Labeling in charts and scientific visualization *

4.3.7.2 Guideline: Use legends, scales and annotations. *

4.3.8 Illusions and Misinformation *

4.3.8.1 Guideline: Avoid moiré, excessive overlap, breakup and color misinterpretation. *

4.3.9 Color *

4.3.10 Visualization Conclusions *

4.4 Interaction *

4.4.1 Do not rely on interaction *

4.4.2 Interaction is required to explore data sets *

4.4.3 Simple navigation of the scene is imperative *

4.4.4 Brushing: User must be able to drill-down to underlying data *

4.4.4.1 Identification *

4.4.4.2 Verification and Education *

4.4.4.3 Guideline: Use brushing for drill-down information and education. *

4.4.5 Searching for Thresholds and Individual Items *

4.4.6 Interaction permits unforeseen combinations and permutations *

4.4.7 Interaction conclusions *

4.5 Guideline and Metric Conclusions *

  • 4. GUIDELINES + METRICS (from the case studies)
  • Information visualizations can convey vast amounts of data to users, as do scientific visualizations. However, since information visualizations are based on abstract data which may be weakly structured and come from various data sources, the design of the visualizations is not known at the outset. The creation of these information visualizations, then, is aided by referring to guidelines and metrics. These guidelines and metrics are based largely on the case studies and experience from the development of more than 130 information visualizations and observations of use in the context of commercial business users solving real business problems.

    The guidelines and metrics are categorized into three main areas:

    These categories closely relate to three key components of information visualization, which are:

  • 4.1 Assumptions
  • Information visualization is still in an early exploratory phase. As new hardware and techniques become available, guidelines will need to change to reflect this (e.g. the early Macintosh’s didn’t support color and guidelines were later adapted for color). Further, the range of techniques available varies widely, depending on what hardware and software is available. Thus some assumptions are required.

  • 4.1.1 Information Visualization focus
  • These guidelines are focused on 3D information visualizations, that is, visualizations that transform abstract data into interactive visual representations. Information visualizations are not:

    Information visualization may often refer to and utilize techniques from each of these areas, since information visualization is a young discipline and borrows established techniques from these other disciplines. The guidelines presented here however may not be applicable in these other domains.

  • 4.1.2 Data
  • The information assumed to be visualized is common everyday information that can be found in data files, tables on web-sites, spreadsheets, relational databases, OLAP databases, and legacy systems. The users of this information are assumed to be the users of the information visualizations.

  • 4.1.3 Hardware
  • The assumed visualization environment is a desktop PC with only a keyboard and a mouse as input devices. The desktop PC is assumed to be powerful enough to support at least 8-bit color graphics (i.e. 256 colors) at a sufficient resolution (i.e. at least 800x600 pixel screen resolution) and fast enough to support the necessary computations (e.g. of a Pentium class CPU or better). This was the typical office computer and home computer of the users of the information visualized in the reference case studies.

    Of the hundreds of visualizations created by Visible Decisions’ Inc., almost all will run on the above hardware and are intended for users in a corporate office environment with no visualization expertise.

    As a result of these hardware assumptions, various possibilities are not addressed by the guidelines:

    However, goggles, gloves and head tracking devices are not likely to be accepted into the corporate office in the near future and thus are not addressed.

  • 4.1.4 Communication
  • Further, although interaction is a very powerful technique essential to the successful deployment of visualization, interaction is NOT assumed to be always present.

    Interaction, in the context of this paper, is required for exploration of data. However, the limitations of communication media impose constraints on the range of possible visualizations, particularly in terms of interaction. These limitations include:

    As a result, the assumption is made that the visualization, when viewed as a static image from a typical point of view, should be coherent and comprehensible.

  • 4.1.5 Guidelines Are Not Rules
  • The following guidelines can be used to create better visualizations. These guidelines are not complete and will continue to change with further experimentation and research. Also, as guidelines, they are not a guarantee to an effective visualization. Poor quality data, poor mapping between data and visual attributes, or inappropriateness of visualization to the user’s requirements, will all result in an unsuccessful visualization. Further, breaking one or more of the guidelines may better solve some goals. The actual circumstances surrounding the goal to be solved must be addressed first.

  • 4.1.6 Metric Assumptions
  • It is desirable to convert the guidelines into metrics. Metrics can be used to evaluate a potential design before it is implemented as well as embedded into programs for automating the mapping of data to visual representations.

    The metrics presented here assume:

  • 4.2 Task Knowledge And Data Structures
  • Task knowledge is an understanding of the user’s goals and the steps and procedures that a user follows to accomplish these goals. Guidelines for clarifying the purpose of the visualization application are more closely related to the application context than to visualization. Thus, various different methodologies can be used to uncover requirements, limitations and resources available to implement the application. These guidelines are itemized requirements that need to be addressed prior to building a visualization.

    For example, a highly experienced developer from a mainframe development environment became an information visualization developer. After developing five information visualizations, he articulated a core challenge of visualization development as a need for the developer to understand the "business" as opposed to "data". In the past, the developer only needed to know the data and the input and output requirements; whereas with visualization he had the potential to work with more than just data. Visualization could address an entire line of business; and, as a result, a visualization developer needed to understand how the business worked in order to be able to create an effective visualization.

    As an example, the Risk Movies visualization (section 3.10) integrates data from two different sources. This data could be presented in 2 sets of independent charts, but would lose the business benefit of correlating scenario inputs (cause) with profits and losses (effect).

    Globus and Raible [GR94] specifically guide the scientific visualization developer to learn about the scientific discipline for which a visualization is being created.

    Experience has shown that before an effective information visualization can be created using any of the visualization or interaction guidelines below, a real goal must exist and sufficient information must exist to support achieving that goal:

    Without a clear identification of the goal, it is easy to create an interesting visualization with little value It is also easier to create a successful visualization with a clearly identified specific goal than a weak or general goal.

  • 4.2.1 Goals
  • The visualization must set out to solve some specific problem that can be articulated in a few simple sentences as goals. For example: "What is the overall risk in my portfolio? How is that risk distributed across time? How does that risk decompose by customer, by instrument or by country?" or "What is the utilization of my staff over the last year? Where was there a chronic shortage of staff by region, by time of day or by age group?"

    All successful visualizations produced at VDI have proceeded through this fundamental step, starting with the first commercially successful visualization (section 3.1). Note that the Annual Report Visualization (section 3.7) did not have a clear goal and the result was of little value.

  • 4.2.2 Understanding the Task
  • Once the goal is identified, there are many possible ways to achieve the goal within various constraints. These can be revealed by considering various requirements of the visualization:

  • 4.2.2.1 Decision Requirements
  • Ultimately the visualization must be able to answer the questions stated in the goal. This can be clarified by determining the types of decisions that will result - a highly constrained decision, for example a stock trader can only either "buy" or "sell"; is a very different type of decision from "Where should the location of a new store be?"; or an exploratory analysis used to determine where strong correlations between different products are.

  • 4.2.2.2 Communication Requirements
  • The user community for the visualization greatly affects the resulting visualization. The number of users influences the complexity of the application - typically an application with more users requires an easier to understand interface to aid the training process. With larger number of users it is difficult to make any adjustments to the hardware platform, or installation of software may be restricted to applications which conform to certain specifications.

    The type of user influences application design. For example, a senior manager who may only spend 5 minutes once a week reviewing a visualization has different requirements than a business analyst who may spend 5 days reviewing one data set.

    When a visualization is used within a group of users, the communication between the users is important. For example, does each user use the visualization independently or as a group? If used independently, how does one user insure that an uncovered pattern is seen by other users? If used as a group at the same time and location (i.e. in a room), how does each user have a chance to interact with the visualization independently, or move through a set of standard configurations? Also, the results may be communicated in some static form, such as print or email, forcing further restrictions.

  • 4.2.2.3 Information Requirements
  • Identifying the information, as defined by the users, will reveal discrepancies from the data actually available. This will also reveal various secondary sources of information which may be not recorded at all (i.e. expert knowledge which the users share but is not recorded anywhere); may be stored in ad hoc locations, such as spreadsheets; or other sources such as files and documents.

    It has occurred in practice that critical pieces of information were not stored in any shared location within the computing environment.

  • 4.2.2.4 Workflow Requirements
  • Understanding the current workflow is important to designing a successful visualization. Workflow reveals how current artifacts support the nature of the work to be completed. This can reveal:

  • 4.2.2.5 Gather Artifacts
  • Current processes utilize a variety of artifacts to share, analyze and present information, such as printed reports, web pages, charts, diagrams, training documents, presentations, etc. These artifacts can reveal insight into current processes and provide clues for a graphic structure for a visualization. The visualization designer can review the current workflow in the context of the artifact, revealing unnecessary information and missing information in current procedures. The artifacts also show how users organize the information. This organization may have a strong correlation to a mental model with which users internally comprehend the data. These structures then are important to the organization of visualization to either support or create new mental models of the user.

    The artifacts may also contain graphical presentations such as charts, graphs, diagrams, flowcharts, etc. These existing graphics may be important cues into structuring a effective visualization.

    As an example, in one project for a statistical analysis organization, two simple prototypes on the same information were constructed. The first visualization used a representation of timeseries charts with which the users were already familiar, and grouped the time series charts together in a 3D scene based on a hierarchy. The second visualization represented the information as an array of boxes of varying width, height and color, sorted and ordered based on an identified workflow. Out of 6 users, only 1 expressed preference for the second (new representation), while all others preferred the first visualization (familiar representation) when adapted to their workflow. The familiar representation was easier to understand by all the users and did not require users to adapt to new models or workflows. Thus given a choice between a different or a familiar representation, both capable of completing a task, a user group is more likely to choose the familiar representation.

  • 4.2.3 Data
  • The following case study from a real-time head trader visualization (section 3.4) illustrates problems with an inadequate understanding of the data during the visualization construction process. The visualization showed a 3D floor plan with various markers for individual employees and their stocks indicating their current status for the day and month as well as their targets and limits. The result was a real-time gauge of the overall effectiveness of the organization’s trading floor at any given moment and the ability to rapidly identify and isolate problems requiring immediate attention.

    The key lesson learned was that the business problem was not the same as the data available to support the problem. While data existed to support the solution, numerous transformations on the data were necessary to arrive at data that directly supported the business problem.

    Some errors made were:

    I. A wide variety of secondary and supplementary data were required but not collected immediately:

    A. Floor plan (visualization layout).

    B. Instrument names and codes.

    C. Employee names and department hierarchy.

    D. Employee codes, and portfolio numbers.

    E. Month-to-date data stored in summary batch files.

    F. Target and limit data stored in manager spreadsheets.

    II. Key core data was not adequately understood.

    III. As the visualization development progressed, multiple sessions were required to clarify key data and supplementary data.

    A. Different data fields were not understood in relation to different tasks (e.g. what is a sale vs what is a trade).

    B. Different data fields had to be checked for particular conditions (errors, tests).

    C. Some records needed to be ignored based on business logic (e.g. after certain time stamps).

    D. Methods for the accumulation of data into summaries were not understood.

    Building the visualization before real data is understood lead to following errors:

    I. The visual scaling was incorrect.

    II. The representation was incorrect for task.

    III. The data format required changes.

    IV. The calculations required required multiple changes.

    V. The scene layout required multiple changes.

    After a number of experiences similar to the above during the construction of other visualizations, Visible Decisions has become more careful when addressing data to be used in visualization. Note that Globus and Raible [GR94] identify issues with data in five out of 14 points in their paper Fourteen Ways to Say Nothing with Scientific Visualization.

  • 4.2.3.1 Data Sources
  • First, there is a need to have a high-level view of what data is available and how (or if) it can be accessed. Depending on the amount of the data involved and the access method (e.g. network), basic issues of size and speed can be addressed (e.g. "Is there enough bandwidth to access the data in a timely manner?"; "Is there too much data for the typical client to handle", etc.).

  • 4.2.3.2 Primary Data
  • Primary data is fundamental to the visualization. It is often the data (or a transformation of it) represented by the views in the visualization. Often the data sources serving this information may serve many different applications, so one must understand which attributes (i.e. members, columns, and fields) contain the data important to the business question and which information is extraneous. This may include ranking the information.

    Visualizations have been rendered useless because of "bad" data. For example, a visualization that automatically normalizes data within the range can squash all the relevant information into too small a range by the presence of a few outliers within the data. Cleveland [Cle93] regards interactive outlier deletion as a valuable tool within information visualization. Cleveland recognizes that outliers, whether valid or invalid, often exists in data sets and thus gives the user control over them.

    The organization and retrieval of the information is critical. A visualization may require data to be accessed in a different manner than the current data sources were originally designed for. For example, one visualization Visible Decisions constructed queried a multi-million row Oracle database on a multi-processor server. Certain combinations of queries required only one or two minutes to complete while other queries required 20 minutes or more. The resulting visualization used the same visual format for the results of the different queries across six key attributes, but the database had been configured and optimized for queries based on only two of the key attributes. The result was difficult to use because the visualization was sometimes too slow.

  • 4.2.3.3 Supplementary Data
  • Information visualizations can address large amounts of data. The organization of that data into a coherent presentation may require various supplementary data. For example, a trading floor visualization (section 3.4) primarily relies on real-time financial market data but also requires a supplementary data set that describes which trader has which stocks at which physical trading desk. To support the fundamental goal, different data sets potentially from different data sources may be required. For example, the Risk Movies visualization (section 3.10) requires data from 2 different sources.

    Various forms of supplementary data may be required, such as:

    For example, VDI has completed half a dozen "plan vs. actual performance" visualizations. Planning data usually occurs within spreadsheets, reporting systems, or adhoc methods. Actual performance data is data collected from the business units and is typically automated through relational databases. This large data store with fine grain detail often becomes the focus of the visualization and the supplementary data of the plan may be forgotten. As a result, the visualization does not address the goal and the value of the result is compromised.

  • 4.2.3.4 Data Calculations
  • Various calculations may be required on the data to transform it from data into information of value to the business problem. Interactive calculations form models, such as "what-if" models. Visualization can facilitate the exploration of and comprehension of these models. For example, for the stock market, the data available may only include the opening price and the current price, but the user may only concerned with the percent change in the price between the current price and the opening price.

    Time to complete the calculation is also a factor. In some applications created by Visible Decisions, data calculations required:

    In many cases, combinations of pre-computed data and proxies have been used to create estimates with a margin of error acceptable to the user. These applications result in visualizations that have highly interactive models that a user can dynamically explore.

  • 4.2.3.5 Special Cases
  • Certain attributes may be inconsistent, missing, or contain special values (such as null values) which require additional handling.

    In many visualizations created at Visible Decisions, there have been special interpretations required for certain conditions on data attributes. These include special interpretations of:

    Also, some values for attributes can be readily identified as invalid data by their range, for example, dates which are outside of the expected range may imply an invalid data object that can be ignored.

  • 4.2.4 Characterize and Rank Data
  • Data characterization is an important step in determining appropriate visual representations. Some visual attributes perform better for certain types of data; for example, hue is effective at classifying between a half dozen different categories, but is ineffective at differentiating between 20 or more categories. Organizing and ranking the data prioritizes the data so that stronger visual dimensions are used to convey the most important attributes.

  • 4.2.4.1 Determine Data Types (Type of Dimension)
  • 4.2.4.2 Multi-Dimensional analysis
  • Different data attributes (i.e. columns, fields, members) can be considered different dimensions. The organization of the data into dimensions is a task that frequently occurs in the planning of databases and a similar analysis can be used when designing an information visualization. A dimension is an attribute of an object that can be used to organize all objects of the same type. Examples of these attributes for a person could be age, ethnicity, gender, city, department, etc. Attributes will be either categorical data (e.g. city, department), or quantitative numeric data (e.g. age, birthdate, salary). Quantitative numeric data is typically a "measured" attribute – and often referred to as a measure. Measured attributes can be:

    For example, consider a human resources database. The database includes a main table identifying:

    Employee name, number, project, department, division, city, country, salary, job level, job category, length of service.

    The key dimensions in this example are:

    This is only an example. If length of service was particularly important, it could be considered a dimension.

  • 4.2.4.3 Rank the dimensions
  • Depending on the user’s goal, different data dimensions will have different priorities in solving the desired goal. Consider a simple sales database containing sales data organized by geographic location, time and product. If the user is most interested in determining which products performed best in which region, a visual organization emphasizing geography and product will be important; where as, an analysis focusing on buying trends for different products will emphasize time and products.

    Figure 4-1: Dimension rank and representation.

    Images by author. <img src="rankDimensionRegion1.gif"> <img src="rankDimensionTrend.gif">

    Two different visual representations of the same data, based on different priorities of data dimensions. The first ranks region highest, the second ranks time highest.

  • Organize the data into structures
  • Data must be organized into efficient computational structures to permit rapid data retrieval at times when the user requires rapid (sub-second) feedback. Typically data is organized into the application program data structures commonly found in computer software, that is:

  • 4.2.5 Task and Data Conclusions
  • An incomplete understanding of goals, tasks, and data may prohibit the creation of an effective visualization. To be effective, visualization planning should include:

    With this planning in place a visualization can proceed with a firm basis.

  • 4.3 Visualization
  • The range of possible visualizations is large. Some visual solutions will be more effective than others will. The following visual guidelines provide a basis for which visual representations are more appropriate and illustrate hazards to avoid. The following list is not complete. There are many other potential guidelines that have not been explored. The justification for the guidelines is based on knowledge from other research (if available), the case studies, and visual examples which demonstrate the effectiveness of a particular guideline.

  • 4.3.1 Use an organizational device the user already knows
  • The visual representation of the data could look like anything: an extruded map, a star field, blobs, etc. A representation that maps closely to how the user already thinks of the data results in a visualization which is more easily learned (and more likely to be used) than one which is not.

    For example, a number of visualizations developed at Visible Decisions include organizational charts to represent hierarchical data as most corporate users are familiar with organizational charts. At Visible Decisions maps (e.g. section 3.5 Sales visualization), floor plans (e.g. section 3.4 Head Trader), grids (e.g. section 3.1 Inventory Viewer), time series (e.g. section 3.9 Timeseries) and rooms (e.g. section 3.10 Risk Movies) have been used to organize information.

    Also, some information visualizations, such as Visage (section 2.5.2.8) and Trellis (section 2.5.2.3), leverage the familiarity with standard chart and map types (Visage) and scatterplots (Trellis) to permit high dimensional spaces to be explored while using representations that are familiar.

    Thus, leveraging from graphical artifacts (e.g. charts) with which the user community is already familiar can be used to organize a visualization. A 3D information visualization typically has many more visual attributes available than existing graphical artifacts, therefore the organizational device can be extended, reinforced with redundant encodings, or use multiple graphical organizational devices combined together.

  • 4.3.1.1 LATCH (Wurman)
  • Richard Saul Wurman [Wur96] presents numerous case studies of visual information presentation primarily focused on 2D graphic design. Wurman suggests that there are only 5 techniques of organizing information effectively:

    Location is a representation based on a known physical organization; for example, maps, floor plans, globes, physical volumes, etc.

    Alphabet is a very specific and effective form of categorization that is easy to access and drill-down on. It automatically provides a histogram (e.g. how thick the A section vs. the Z section is of a dictionary). It deviates from traditional categorization, in that the organization is arbitrarily imposed (sequence of the letters of the alphabet) as opposed to categories within the data (e.g. a department store grouping of products). A generalization of this concept is an arbitrary sorted ordering based on an arbitrary key.

    Time is a form of continuous dimension; as are height, temperature, stock price, etc. Continuous dimensions are an effective means of organizing information. Time is easily understood (timelines) and flows only in a positive direction. In some cases, time is not a continuous dimension but a sequence, as might be found in a narrative.

    Categorization is a set of discrete groupings occurring within the data. These can be externally classified (e.g. the industry sectors of the stock market), identified analytically (e.g. statistical correlation), or identified on an ad-hoc basis (e.g. search or query).

    Hierarchy is another form of discrete groupings, with successive levels of detail. These multiple levels of summarization can be potentially used in conjunction with any of the previous organizations.

    In addition to Wurman’s 5 organizations, the following have also been used in visualizations at Visible Decisions, and do not seem to fit into any of Wurman’s categories:

    Graph is a set of objects (vertices) and relationships between them (edges); for example, a network.

    Scatterplot is a set of objects whose location is based on a measure within given data dimensions.

  • 4.3.1.2 Holmes
  • Nigel Holmes [Hol96] has presented numerous examples of visual information presentation that he has categorized into four basic chart types:

    From these, many other chart types, enhancements, and embellishments can be derived to create a graphic that delivers a complete communication, from a simple newspaper chart to a detailed graphical inventory.

  • 4.3.1.3 Axes
  • As opposed to user centric organizational devices described above, this technique is data centric. The visual organization is defined by choosing key data axes and directly mapping them into corresponding visual dimensions. The visual dimensions typically include:

    The above visual dimensions are intuitively different and are supported by perceptual psychology (section 2.2.2) and visual neurophysiological findings of multiple parallel visual pathways within the human perception system (section 2.3). Although research has not verified, the existence of the parallel pathways suggests that different information can be perceived simultaneously though the different channels.

    Choice of a few key data dimensions and visual dimensions may provide enough structure so that the rest of the information can readily adapt to this structure. Consider the example of the railroad operations graphics, recurring in a number of Tufte’s books [Tuf83,90] (this example is the WWII top secret Java timetable from [Tuf90]). The visualization maps the horizontal axis to time of day and the vertical axis to distance along the railroad. After these base dimensions have been chosen, trains simply become diagonal lines moving through time and space, with visual features such as slope mapping to speed. Other attributes are simply accommodated, such as train type represented by line types and line styles.

     

    Figure 4-2: Railroad timetable.

    Image from [Tuf90]. <img src = "javaTrain1.gif">

    The mapping of data dimensions into visual dimensions may not relate to graphical artifacts the user is already familiar with, but may map into conventions that a user is already familiar with. For example, in the railroad operations described above, the user may already be familiar with plots of timeseries that place time on the horizontal axis. These timeseries graphs are a frequent graph type used in annual reports, business sections of newspapers, etc. Thus, the railroad example and other time representations can leverage a preconception of time on a horizontal axis with time increasing from left to right. Similarly, the user community may already conceive of the railroad as a one-dimensional axis of distance through the use of textual timetables or strip maps.

    Since there are multiple parallel pathways of visual cognition, as previously shown in both perceptual psychology and neurophysiology (sections 2.2 and 2.3), the mapping of visual axes directly from data dimensions may be effective because data is being organized into visual representations which are processed cognitively in parallel. Thus the viewer’s skill of visual integration from the processed parallel information (red + octagon = "stop") is harnessed to visually correlate multivariate data (green + round = "buy").

  • 4.3.1.4 Guideline: Use overall organizational devices to structure a visualization.
  • Decide on overall organizing principles. Use organizational devices that the user is already familiar with, such as existing graphical artifacts and conventions of organizing the current information for problem solving. The choice should depend on what’s most important to the user’s question, for example, don’t use a map if the actual location is not important to the goal.

    Often more than one organizational device will be used within a visualization. For example, the visualization below (Figure 4-3) is a room with 2 walls and a floor. It contains an organizational chart on one wall to select and display the information subset, a map on the floor to select and display information based on geographic region and time-series charts on the other wall.

    Figure 4-3: Decision Support showing multiple organizational devices.

    Image from Visible Decisions. <img src = "oilCo1.gif">

  • 4.3.2 Small Multiples (Glyphs)
  • In early scientific visualization literature, "glyph" and "icon" are used to describe a 3D object which conveys data attributes through it’s various visual attributes, such as, shape, color, location, scale, orientation, etc. It may refer to a simple 3D object (e.g. a cube) or for a complex 3D object made of many parts (e.g. a stick figure with different attributes for the head, body and limbs). This 3D object is then repeated, one 3D object per each data object.

    This idea of a glyph is identical to Tufte’s [Tuf83, 90] concept of a "small multiple". Tufte defines a small multiple as:

    "Small multiples resemble frames of a movie: a series of graphics showing the same combination of variables, indexed by changes in another variable."

    A small multiple may be as simple as a colored dot, bar, arrow, cube or a complex as a surface with spikes on it, a stacked set of cubes, or even representational such as storage tank, or car, or families.

  • 4.3.2.1 Wind Example
  • The following is a simple example using an arrow as the small multiple:

    Figure 4-4: Small multiple wind example.

    Image from U.S. government web site: http://sfbay7.wr.usgs.gov/wind. <img src="sfbay1.gif">

    Wind patterns over San Francisco Bay. Arrows show wind direction and velocity.

    Each arrow encodes:

    The field of arrows is indexed by latitude and longitude. This is made more apparent by the photographic map underneath.

  • 4.3.2.2 Family Example
  • A more detailed example of a small multiple is shown in this example [Hol96] of Social Stratification in the United States:

    Figure 4-5: Small multiple family example.

    <img src="socialStratification1.gif">
    This is a small portion of "Social Stratification in the United States", first published in 1992 by The New Press, authored by Stephen Rose

    In this example each small multiple is a person (or pair) over a colored shape where:

    The field of these shapes is organized vertically based on income, and horizontally based on occupation.

  • 4.3.2.3 Small Multiples (Tufte)
  • From Edward Tufte [Tuf83], small multiples result in many tiny graphical objects that can be analyzed individually or compared across each other as a group. Small multiples, according to Tufte are inevitably comparable, deftly multivariate, high density and efficient in comparison. Also, Tufte states that visual memory is weak while visual comparison is strong. The use of small multiples permits the user to quickly compare and contrast between small multiples, thus freeing the user from using short term memory and enabling visual comparison; resulting in an overall benefit [Tuf83].

  • 4.3.2.4 Trellising (Cleveland)
  • Cleveland [Cle93,95] (described in section 2.5.2.3) makes heavy use of the small multiple concept using a technique referred to as "Trellising" [BCS94]. Trellis displays are matrices of scatterplots used to present multiple simultaneous dimensions of data. Adjacent edges of scatterplots are aligned along the same axes permitting direct comparison across adjacent cells.

  • 4.3.2.5 Worlds-Within-Worlds (Feiner)
  • Feiner [BF93] (described in section 2.5.2.9) reuses a surface plot multiple times, using an initial surface plot as an index and subsequent surface plots for drill-down and comparison. In effect Feiner has created a small multiple of a surface plot.

  • 4.3.2.6 Preattentive Perception (Treisman & Healy)
  • All tests for preattentive perception of visual features (section 2.2.2) use small multiples of a visual object that may or may not contain visual feature being tested. These tests show that small multiples, at least for simple generalized forms, have the capacity to rapidly convey existence or omission of a feature.

    Further, Healy [HBE96] has shown that small multiples can convey multivariate data in support of more complex goals such as visual estimation. In Healy’s example, the small multiples used hue and orientation to convey type and direction, while latitude and longitude were used to index each of the small multiples.

    While psychology has shown that Healy’s results cannot be automatically generalized to other combinations of visual attributes on small multiples, it does show that fast estimation is feasible and can be shown through experimentation.

  • 4.3.2.7 Origins of Written Language (Schmandt-Besserat)
  • The origins of written language were assumed, until recently, to derive from a pictographic origin.

     

    Figure 4-6: Early writing.

    Images from [Sch92]. <img src="originOfLang1.gif"> <img src="originOfLang2.gif">

    Reviewing the origins of written languages, reveals very abstract "impressions" at the earliest stages (left). As writing evolved, inscriptions and impressions are found (right) – but neither appear to representational, as a pictographic origin would suggest.

    Denise Schmandt-Besserat [Sch92] postulates with archaeological evidence and anthropological data that writing is the outcome of a phase of counting through the use of small physical artifacts called tokens. Tokens were small (1-3 cm) baked clay artifacts of standard shapes; where a particular shape referred concretely to a unit of a particular object (e.g. one cylinder represented one sheep, one incised ovoid represented one jar of oil).

    At the earliest stage, there were a few basic shapes for basic commodities. These tokens were presumably used as counters, and some were found inside clay containers called envelopes that were presumed to be a record of accounting for a transaction or contract.

     

     

    Figure 4-7: Tokens representing quantities of commodities.

    Images from [Sch92]. <img src="originOfLang3.gif"> <img src="originOfLang5.gif">

    At the earliest stage of tokens (left), only three basic commodities were represented as abstract forms. Later, more commodities were added to the token system and the forms became more representational (right).

    As the civilization advanced, new shapes were added to the system to count new commodities (e.g. beer, honey, rope, wool, milk, oil, bread, etc.). Also larger and/or fractional shapes were introduced to adapt to different orders of magnitude (e.g. a cylinder represented a single sheep, a larger cylinder represented a herd of (10) sheep, thus 33 sheep were represented by 6 tokens instead of 33). Thus variations to scale and form were used to signify orders of magnitude.

    "Compared to earlier tallies (e.g. notches, sticks), the token system provided new ways of handling data... The innovations were twofold: cardinality and object specificity."

    The token system further evolved into both writing and abstract counting (i.e. the separation of notion of number from the item counted e.g. "4" musicians vs. quartet). The tokens evolved through phases of:

    Tokens represent a lower cognitive effort than abstract numbers: each token represents a concrete entity; and, express plurality as it is experienced perceptually. By contrast, abstract counting requires a separation of quantity from object; this separation requires extra cognitive effort to correlate.

    Tokens wielded power in ancient civilization. Political power relied upon the control of goods, which in turn depended on counting and accounting. Given the criteria for making concise time-critical business decisions based on a variety of measured factors, the presentation of these factors with the least cognitive effort will aid the perception of the information. Thus the careful, controlled use of tokens should be considered within information visualization.

    The use of tokens in early human communication [Sch92] is another example of a small multiples, but used in a very different manner - instead of comparing and contrasting among similar tokens, the tokens were used as counters and as a set of signifiers attached to contractual obligations. By visually scanning similar tokens an early user of these tokens would be able to quickly determine which entities and what quantities were involved - a task similar to visual search among small multiples.

  • 4.3.2.8 Visible Decisions Experience
  • Small multiples are used in many VDI visualizations, ranging from very simple small multiples; e.g. the store cell and cube used in the sales visualizations (section 3.5); to the complex small multiples used in the bid-ask-trade stock visualizations (section 3.2); headtrader visualizations (section 3.4); and multiple timeseries visualizations (section 3.9). Small multiples have proved to be highly effective in practice when the small multiple is well designed.

  • 4.3.2.9 Guideline: Use small multiples to visually convey multivariate data
  • A small multiple is a highly effective means of conveying multivariate data.

    Why is a small multiple effective? Humans have dealt with small multiples for a long time, from the origin of language, to picking out a dime from a pile of change or finding a car in a parking lot. It is difficult to quantify why small multiples are effective, compared to different representations. It is doubtful that complex small multiples are preattentively perceived because small multiples tend to be highly dimensional and preattentive perceptual performance typically degrades when different visual attributes interfere with one another. However, small multiples group together similar information by location which frees the user from symbolic label matching [LS87] and permits the user to quickly compare and contrast between small multiples, thus freeing the user from using short term memory and enabling visual comparison [Tuf83]; resulting in an overall benefit.

  • 4.3.3 Reference context
  • An object in 3D space can be located using different perceptual mechanisms:

    Objects floating in space are difficult to locate in a single image. A single image occurs when a visualization is printed out, or in front of an audience where only one person has control over the motion of the 3D scene.

    Figure 4-8: 3D Scatterplot.

    Image by author. <img src="scatter1.gif">

    Where are the points in this image? Which ones are close together?

    However, if all these objects refer to a common reference, such as a ground plane, then the location of each object relative to each other is possible. In this case it can be seen that points which previously appeared close together have a larger separation than seemed apparent in the first case. Also, it can be seen that the points are laid out at regular intervals along the horizontal axes:

     

    Figure 4-9: Anchored 3D scatterplot.

    Image by author. <img src="scatter2.gif">

    Any two points can be compared by visually scanning down the anchor line,
    across the base plane and up the comparison point’s anchor line.

    There is no difference in the data in these two images: the only difference is in the visual representation. Thus, apparently simple choices of visual mappings of data can have a tremendous impact on the usefulness of the resulting visualization.

    Although 3D scatterplots were explicitly identified in the prior work of information visualization, 3D scatterplots frequently recur at visualization conferences every year in spite of these problems. Within the Proceedings of Visualization 1997, (an annual conference primarily on scientific visualization) a total of 575 images of 3D visualizations appear, of which 28 images do not have a reference context (almost 5%). One of these examples, by scientific visualization expert, Holly Rushmeier [RLA97], summarized the lessons learned for 3D scatterplots as:

    The immediate guideline that follows from this is:

    Don’t use 3D scatterplots.

    3D scatterplots and other unanchored representations can be made more effective by providing reference contexts, discussed in the next section.

  • 4.3.3.1 Reference techniques
  • Various scientific visualization techniques provide a means for increasing the amount of reference between data points in the data set. Techniques such as isosurfaces, blobs, splatting and volume visualization all aid in the identification and correlation of adjacent datapoints within a 3D space. These techniques may not be completely sufficient; for example, an isosurface may result in a number of independent blobs which cannot be located relative to each other.

    Figure 4-10: Blobby surface.

    Image by author. <img src="blob3.gif">

    Although the points on the blobby surface are identifiable with respect to other points on the same surface, it is difficult to locate various blobs with respect to one another. Is the blob to the right beside the blobs to the left, or also further back (i.e. in depth) as well?

    There are other solutions to locate objects within a 3D space:

    Reference Planes: Every point, blob, object can be anchored to a graphic reference plane such as a ground plane, wall, or arbitrary surface. An anchor line can refer an object to a ground plane (as shown above in Figure 4-9: Anchored 3D scatterplot.); or to an arbitrary surface, below. Other techniques, such as shadow walls, can also anchor objects.

     

    Figure 4-11: Reference planes and shadow walls.

    Images by author. <img src= "sketch1.gif"> <img src= "shadowWall1.gif">.

    Chains: Similar to the scientific visualization technique of "streamlines", chains capitalize on relationships between objects to join related objects together graphically. Examples include timeseries, hierarchies, webs and organizational charts:

     

    Figure 4-12: Reference chains.

    Images by author. <img src="referenceContextChain2.gif"> <img src="webviz.gif">.

    Successive points in 3D space are linked together by lines forming chains of points. By constraining the chains, (e.g. to be orthogonal, occur within a plane, occur between points which are close, etc) different points in space can be compared to one another.

    Stuart Card [CRM91], an early pioneer of information visualization, introduced the "cone-tree" visualization (section 2.5.2.1). A cone tree is a hierarchy represented as a 3D organizational chart radiating from successive cones. Since each of the cones is oriented similarly and connects to each other in an regular manner, it is possible to compare nodes occurring at different locations within the hierarchy.

    3D graph drawing algorithms (section 2.5.2.5), however, may result in edges oriented at any arbitrary rotation in 3D space. For example, most "spring" algorithms will result in these arbitrary connections (Figure 4-13). As a result, it is impossible to compare nodes since depth cannot be known until the scene is rotated. For small graphs, this is not an issue, since the links explicitly show connections; but in large graphs with many links, the crossings of the links and occlusions makes it difficult to locate and compare nodes.

     

    Figure 4-13: Which nodes are close together?

    Images by author. <img src="spring1.gif"> <img src="spring2.gif"> <img src="spring3.gif">.

    If points on different chains are anchored to a common reference with a predictable ordered fashion (such as cones all aligned along one axes or links only following the x, y and z axis), then multiple chains may be compared to one another.

  • 4.3.3.2 Guideline: Always provide a reference context
  • A reference context is required to permit spatial comparison of different visual objects. The initial Multiple Timeseries Visualization (section 3.9) did not have a sufficient reference context nor interactions suitable to create a context and was of little value. Later successful variants of this visualization embedded stronger reference contexts such as grids, planes, and alignments of points in time, or alignments of complete timeseries. Also, an initial version of the LME visualization (section 3.2.3) suffered from a lack of reference context and was unusable.

    Justification: Without a reference context, a 3D visualization is difficult to use for comparing objects based on location, especially if this comparison is to be done from a static image. Since static images are often used to distribute or discuss the results of a visualization, a reference context is necessary.

  • 4.3.3.3 Reference Context and Percentage of Identifiable Points
  • If the location of a graphic object cannot be identified, then the user may not be able to comprehend some of the values represented by that object. A visualization should provide a reference context that can be comprehended from most viewpoints. Techniques from other domains such as CAD are applicable, for example, shadow walls and anchor lines.

    Percentage of identifiable points: number of visible data points
    identifiable in relation to every other visible data point / (number of visible data points2)

    Scatterplot Example: For the typical 3D scatterplot, as previously shown, it is difficult to establish the depth of different data points, even if multiple simultaneous viewpoints are available. Typically, interaction or animation overcomes this. These interactions can be difficult for novice users and are not available in print. A better solution is to draw a line from the data point to a reference plane, as previously shown. Before the line is added, the percent identifiable is close to 0; after the line, the percent is close to 100.

    Blob Example: Blobby visualizations frequently have multiple blobs within the scene. All the points on a given blob are identifiable in relation to each other, but the blobs are not identifiable in relation to each other. For example, a visualization with 2 similar sized blobs will have a percentage of identifiable points near 50% since the points on one blob can not be adequately referred to the points on the other blob.

    Table 6-1 in the Appendix shows the Percentage of Identifiable Points for a number of published images of 3D visualizations. As can be seen from this sampling, numerous visualizations are published with images that cannot be adequately interpreted.

    A quick evaluation of a visualization’s reference context is the percentage of identifiable points. If this value is low, other techniques for increasing the reference context should be considered.

  • 4.3.4 Number of Data Points and Data Density
  • Tufte’s[Tuf83] metric of data density is applicable to 3D information visualization. The general assumption is that visualizations with a low number of data points in them (or visuals with a low amount of data per square centimeter) are not effective because there may be other representations that are effective. Representations that have high densities present a large amount of information for visual comparison.

    Since many 3-dimensional visualizations are displayed on high resolution displays of similar resolutions (e.g. 1280 pixels by 1024 pixels) one could ignore the denominator and focus only on the number of data points. The number of data points is of particular interest if upper and lower bounds can be established – for example, a useful guideline might be the statement: "For less than 20 data values, use an alphanumeric display".

    Number of Data Points: is the number of discrete data values represented on screen at an instant. For example, a typical 2D time series line chart with 20 intervals along the time axis has 40 data points (20 2-coordinate pairs).

    Data Density is the number of data points divided by the number of pixels in the display where number of pixels does not include the pixels in the window borders, menus, etc.

  • 4.3.4.1 Bounds
  • Visualizations at VDI have ranged from 200 to 100,000 data points. Table 6-1 in the appendix shows the number of data points from various visualizations.

    Lower Bound:

    Visualizations with less than 500 data points can be questionable visualizations - the same information can be represented in other (simpler) forms, such as tables or 2D charts. For example, a time series with 250 time intervals and 250 corresponding measurements can be represented in a 2D line chart. Or, a 12 x 12 array of values can be represented as a 2D array of bars or "heat map" (a 2D grid of squares with each square’s hue mapped to a data value).

     

    Figure 4-14: Lower bound.

    Images by author. <img src="spring1.gif"> <img src="spring2.gif"> <img src="spring3.gif">.

    400 daily closing stock price and volume values are shown in a typical 2D price and volume chart (top) and a 3D chart (bottom). The correlation between major changes in price and volume is visually apparent in the 2D representation. The 3D layout does not make the data more clear but some volume bars projecting backwards obscure some of the other volume bars.

    This is not a fixed lower bound. The lower bound threshold depends on:

     

    Upper bound:

    Visualization can deliver overwhelming amounts of information. A simple upper bound has not been established. A trivial upper bound is the number of pixels in the display (a little over 1,000,000). This is well beyond the range that any visualization has shown in one instant at Visible Decisions. Some early experimentation with individual pixels [Bra91] did not prove to be effective.

    The number of data points does not directly correlate with perceived complexity or effectiveness. Applications with 50,000 to 100,000 data points (e.g. 100 time series each with 500 time intervals) have been perceived as effective while other applications with 2,000 data points (e.g. an application with 500 objects with 5 values each) have been perceived as too complex.

    Instead a metric that measures a visualization for complexity is discussed in the next section.

  • 4.3.5 Visualization complexity
  • What makes some visualizations difficult to comprehend? Why are they perceived as being complex? Is it related to the number of different dimensions simultaneously displayed? Is it related to the familiarity with the representations? Is it related to number of visual attributes that vary? These questions are all about the complexity of a visualization.

    The same question can be asked of complex 2 dimensional graphics. Tufte [Tuf83] notes that complexity is one of the issues of a multivariate design task: the design task is, in fact, to make a clear portrayal of complexity.

    Complexity is difficult to measure. Other fields of research have attempted to establish measures of complexity; for example, software complexity. It can be difficult for professionals to agree on measures; for example, various measures of software complexity have been proposed, such as, number of lines of codes, depth of nesting, average length of subroutine, etc. Users will find some utility in these imprecise measures; for example, the number of lines of code is frequently used by professionals to quickly gauge software module complexity.

    Measuring complexity in a visualization is difficult to determine. What is easy to comprehend vs. what is difficult depends in part on the familiarity of the user to the given graphical domain. Larkin and Simon [LS89] have previously shown how different users interpret the same visual very differently, depending on their expertise. For example, expert chess players have acquired processes to see empty columns of space while novices only see empty squares.

    The appropriate level of complexity and density of the information in the visualization is dependent on:

    a. User: For example, executives who will use a visualization 10 minutes every day need a simple, less dense visualization than an analyst who may use the same visualization for a week.

    b. Data type: Homogenous data can be more densely represented than highly multidimensional loosely correlated data.

    Information visualizations developed at Visible Decisions Inc. range from a lower bound of 500 data points displayed simultaneously to 100,000 data points displayed simultaneously. Scientific visualizations often closely map to a real 3D physical model on top of a dense homogenous array of data and may display up to 1,000,000 simultaneous data points out of a much larger underlying dataset. The metrics section discusses this in more detail.

  • 4.3.5.1 Guideline: Use redundancy to aid discrimination and comprehension
  • Tufte [Tuf83] advocates a "less is more" approach to graphical design in which he describes as reducing the non-data ink and maximizing the data ink. In terms of 3D visualization, this might be interpreted as reducing the number of points, polygons and colors that do not convey data information. A naïve extension of "less is more" to 3D can result in minimalist visualizations.

    A minimalist visualization (where one data attribute maps to only one visual attribute) may be difficult to comprehend. It may result in overlap; or it may not map the data measure to an appropriate connotative measure. Thus a mapping which contains some redundant or extra geometry may result in an easier to use visualization. A mapping which maps the same data attribute onto multiple visual attributes may aid the user’s comprehension and learning of the visualization by providing multiple channels to deliver the information, with the channels reinforcing one another and aiding the user to understand the corresponding mappings.

    The following sequence of 4 images all show the same data set. One hundred objects with two attributes (A and B) each placed on a 10 by 10 grid:

    Figure 4-15: Rectangles with no redundancy.

    Image by author. <img src="redundancy1.gif">

    No redundancy or "extra ink". Each rectangle shows variable A by it’s height and variable B by it’s width. The edges are impossible to discern. The view position could be modified such that each rectangle is distinct, but this would depend on the range for the values and burden the user with the effort of finding an appropriate view.

     

    Figure 4-16: Rectangles with redundant arbitrary depth

    Image by author. <img src="redundancy2.gif">

    Rectangles are transformed into boxes with an arbitrary depth. This helps to identify the edges between boxes in close proximity.

     

    Figure 4-17: Rectangles with redundant red/green encoding.

    Image by author. < img src="redundancy3.gif">

    Rectangles are colored based on variable A and B in addition to height and width based on variable A and B. The color helps discriminate. Color also reinforces the mapping of the data variables (tall thin rectangles are red… big squares are yellow… etc.) and aids the user’s interpretation of color space potentially increasing comprehension.

     

    Figure 4-18: Red/green squares with no redundancy.

    Image by author. <img src="redundancy4.gif">

    Only color is used to convey variables A and B. The user is now burdened with determining how the color space correlates with the variables. What combination of red and green makes brown?

    Redundancy can help the user become familiar with different visual dimensions.

    Justification: The above image sequence shows examples where the extra "redundant" visual information provides value. The model presented by Lohse (section 2.4.5.3) as well as actual experimentation show significant improvement for one graphic display which includes color redundantly encoding the same information as shape over a graphic display encoding information using shape only. An example from preattentive perception also found improved search performance in a display containing additional geometry.

    Figure 4-19: Additional shape in the second image had improved search performance.

    Image by author. <img src="additionalBraces.gif">

    The scatterplot example at the beginning of this paper (section 1.1) also provides a strong illustration supporting this concept as well: the extra lines do not map any additional data to the scene, but provide additional information to the user. Foley and Van Dam [FV82] (section 2.4) recommend redundant coding to increase discrimination; for example using both shape and color allows the user to use either one or both coding methods. Redundant encoding is used in traditional 2D graphical presentation, for example, the size and symbol for cities on roadmaps redundantly reinforce each other, as do color and line style for roads on roadmaps.

  • 4.3.5.2 Guideline: Use different visual dimensions differently
  • Use different visual dimensions (e.g. size, location, orientation, form, color, transparency, motion) for different types of data. This includes different characterizations of data (e.g. enumerated, text, continuous, discrete) as well as connotative mappings of data to representation (discussed in the next section).

    Some visual dimensions must normalize all the data into a restricted range. For example, orientation of an element may be restricted to 270 degrees of rotation (otherwise an element rotated 0° appears the same as an element rotated 360° ), similarly brightness must be mapped into a 0 to 100% range.

     

    Figure -20: Positive vs negative differentiation mapped to height of an object.

    Images by author. <img src="posneg1.gif"> <img src="posneg2.gif"> <img src="posneg3.gif"> <img src="posneg4.gif">

    Positive and negative values are mapped to the height of an object. In the left image, there insufficient context to differentiate between the cube with the positive and negative value (the right cube may be in the foreground or to the left of the other cube). The next three images disambiguate the relationship by adding a reference context, adding color, or using a shape that differentiates direction, such as a cone or arrow.

    Some visual dimensions do not adequately map data attributes that range into negative values. For example, how does a pie chart represent a negative value? Similarly, a cube may not differentiate between positive and negative heights (it looks like a cube in both cases) whereas a cone will differentiate between positive and negative heights based on the direction it is pointing. If the developer uses these visual dimensions for data with negative values, then the developer must create a suitable differentiation.

    Color does not map well to an enumerated data type where there are more than approximately 10 enumerations; for example, 20 unique data categories do not map effectively to 20 unique colors but can map to 20 unique locations (see section 3.6 for an example).

    Bertin [Ber83] (section 2.1.2.3) has produced an excellent categorization the perceptual variables (form, orientation, color, thickness, brightness, texture and size) and the types of visual analysis for which these are effective:

     

    Table 4-1: Bertin’s Matrix of effective perceptions by visual dimension and task

     

    Visual Task


    Visual Dimension

    Association
    (perceived as similar)

    Selection
    (perceived as different)


    Order

    Quantity (perceived in proportion to each other)

    Size

    X

    OK

    OK

    OK

    Brightness (Value)

    X

    OK

    OK

     

    Texture

    OK

    OK

    OK

     

    Color (Hue)

    OK

    OK

       

    Orientation

    OK

    OK*

       

    Shape

    OK

         

    * Orientation is effective for differentiating points and lines, but not areas.
    Adapted from Semiology of Graphics [Ber83] p. 96


    Cleveland and McGill [CM84] have observed that people accomplish different perceptual tasks associated with interpretation of quantitative graphical display of information with different degrees of accuracy. In order of accuracy they rank:

    Items at the same level are considered to have the same degree of accuracy. Note that Cleveland and McGill focused on quantitative data types (Bertin refers to the quantitative data type as the ratio and interval levels of measurement, in section 2.1.2.3 and this has also been referred to as the continuous data type, in guideline section 4.2.4.1).

    Mackinlay [Mac86] has extended Cleveland’s work to three levels of measurement: quantitative, ordinal and nominal. Mackinlay contends that some perceptual tasks are better applied to different levels of measurement. In this diagram, Mackinlay sets out rankings for the three levels of measurement:

     

    Table 4-2: MacKinlay’s Rank of Visual Attributes by Data Type

    Rank

    Quantitative

     

    Ordinal

     

    Nominal

    1

    Position

     

    Position

     

    Position

    2

    Length

     

    Density (Brightness)

     

    Color Hue

    3

    Angle

     

    Color Saturation

     

    Texture

    4

    Slope

     

    Color Hue

     

    Connection

    5

    Area

     

    Texture

     

    Containment

    6

    Volume

     

    Connection

     

    Density (Brightness)

    7

    Density (Brightness)

     

    Containment

     

    Color Saturation

    8

    Color Saturation

     

    Length

     

    Shape

    9

    Color Hue

     

    Angle

     

    Length

    10

    Texture

     

    Slope

     

    Angle

    11

    Connection

     

    Area

     

    Slope

    12

    Containment

     

    Volume

     

    Area

    13

    Shape

     

    Shape

     

    Volume

    Figure 4-21: Ranking of perceptual tasks by data type (level of measurement).

    Image from [Mac86].

    These visual rankings can be a useful guide for determining which data attribute to map to visual attribute, assuming that data dimensions have been ranked (section 4.2.4.3) and this approach has already been described in general terms in section 4.3.1.3.

    Within Visible Decisions visualizations, these rankings are used to some degree:

     

    Visual ranking does not account for cultural or psychological connotations (discussed in the next section) nor does it account for interactions between the visual dimensions. Using the above rankings in a strictly automated fashion will have unintended side effects – visual dimensions interact with each other and can lead to results that cannot be properly perceived. Consider the following two examples:

     

     

    Figure 4-22: Examples of confounding visual attributes.

    Image by author. <img src = "confoundingVisualAttributes.gif">

    Example A (left): Size confounds Form. Example B (right): Brightness confounds Hue

    In example A, both size and form are used to convey data attributes. However, as size approaches zero, it become more difficult to discriminate between the different forms. In example B, both brightness and hue are used. However, as brightness approaches zero, it becomes impossible to discriminate between hue.

    Justification: As many information graphic researchers [Ber83, Mac86, CM84] have found, using appropriate visual dimensions for the task is critical to effective information visualization. At Visible Decisions, visualizations such as the trade routing visualization (Figure 3-12) revealed difficulties when using visual dimensions inappropriately.

    The guideline for using visual dimensions is to use different visual dimensions differently. Using ranked visual dimensions may be a useful starting point for mapping data into visual attributes, however, the interactions between the different visual variables and cultural connotations need to be addressed. This is an area for future work in other research disciplines.

  • 4.3.5.3 Guideline: Use connotative mappings
  • Data measures can be mapped to visual dimensions based on connotative properties of the data measure. For example:

    The following 2 images are from visualizations that display very similar information but represent the information with very different mappings.

     

     

    Figure 4-23: Arbitrary vs connotative mappings of data to visual attributes.

    Images by author. <img src="sp500bad1.gif"><img src="sp500good1.gif">

    The first visualization arbitrarily maps data values to visual attributes. The second visualization maps data values to connotative representations; for example:

    Users presented with these visualizations for the first time were much more adept at identifying meaningful patterns with the connotative visualization and had more difficulty interpreting the arbitrary mapping.

    Cultural conventions also carry connotations. These must also be considered in the design. Cultural connotations can be simplistic, e.g. red and green do not necessarily map to bad and good respectively in all cultures; or more subtle, e.g. one client took offense at the coloration of markers corresponding to various countries that had arbitrarily assigned yellow to Japan.

    Connotative mappings are used in many visualizations at Visible Decisions. For example, WebViz (Figure 3-21) maps brightness of the small multiple to the age of the document, size of the small multiple to the size of the document, and directly maps an image of the document to the front of the small multiple. Simple connotative mappings, such as green and red used for good and bad respectively, are common in the most popular VDI visualizations, such as Risk Movies (Figure 3-16) and Sales (Figure 3-11).

    Justification: Without using connotative mappings, there is more effort on the part of the user to remember the mapping - the user must explicitly recall the mapping for each data attribute into its corresponding visual dimension.

    In the domains of virtual reality and scientific visualization, Fred Brooks [Bro88] (section 2.4.4.1) identified in early research that the appropriate selection of metaphors substantially helps users to define issues and make consistent decisions. A number of scientific visualization techniques are based on simple connotative mappings that can be quickly understood. For example, streamlines and hedgehogs are used to represent fluid flows in visualizations of computation fluid dynamics; that is, oriented lines in a 3D space indicating direction of fluid flow. Similarly, isosurfaces are used to represent boundary conditions in 3D volumes; that is, a lumpy surface in 3D space indicating an edge, such as cloud.

    In the domain of information graphics, many of Tufte’s [Tuf83,90,96] examples of small multiples use connotative small multiples. Nigel Holmes [Hol93] also showcases many examples with highly connotative mappings, such as Figure 4-5.

  • 4.3.5.4 Measuring Complexity
  • A complex visualization is more difficult to comprehend than a simple one. An effective visualization should seek to reduce complexity.

    Visualizations frequently represent multivariate data. One hypothesis is that the greater the number of dimensions which are displayed in the visualization, the greater the cognitive complexity for the user. For example, a simple 2D chart representing many different variables may use many different colors and/or patterns resulting in more cognitive effort on the part of the user to remember the mapping between the data dimensions and the representation.

  • Number of simultaneous dimensions
  • A simple measure of cognitive complexity is the number of different data dimensions displayed simultaneously (assuming that the number of dimensions is an indicator of complexity).

    Visualizations at Visible Decisions have ranged from 3 to 150 data dimensions simultaneously displayed in the visualization. Experience indicates that higher dimensional problems are difficult to design suitable representations for; and implies the lowest number of dimensions which solves the task may be desirable.

    It may seem non-intuitive to show more than 3 dimensions of data as a three dimensional visualization, however, there are numerous strategies for displaying highly dimensional data:


    Figure 4-24: Parallel coordinates.

    Image by author. <img src = "parallelCoordinates.gif">

    20 different dimensions of airline data shown as a parallel coordinate representation. Each different data dimension is shown as a vertical axis, with the value for each airline plotted for that axis. Successive values for each airline are connected with a color coded line.

    Figure 4-25: Stacked coordinates.

    Image from Visible Decisions. <img src="parallelCoordinates3D.gif">

    40 different quantitative data dimensions are shown as squares on vertical stacks. The size of the square is based on the quantitative data. The data dimensions can be re-ordered and applied to the stacks (i.e. re-order the squares on the stacks). This facilitates visual comparison of different stacks to each other (i.e. stacks with similar profiles show similarities across many dimensions). Stacks are located on a 2D plane based on any 2 data dimensions.

    Table 6-2 in the appendix lists the number of dimensions for a variety of visualizations. One observation from this table is that visualizations, which are seemingly intuitive, such as the Risk Movies visualization (Figure 3-16), may have high numbers of dimensions, while visualizations considered difficult, such as SP 500 (Figure 3-7) may have low number of dimensions. Thus a simple "number of dimensions" score is an insufficient of complexity and requires refinement, such as those discussed in the next sections.

  • Maximum of the number of dimensions for each separable representation
  • When the number of dimensions was reviewed for existing visualizations, many of the visualizations had higher numbers of dimensions than intuitively thought. These visualizations with high numbers of dimensions were often compound visualizations. A compound visualization results from two (or more) spatially distinct different data representations, each of which can be understood independently, but can be used together to correlate information in one representation with that in another. For example, the representations on each of the walls in the Risk Movies visualization (Figure 3-16) can be understood independently and address separate tasks, such as "what is the distribution of profits by scenario?" or "what are the parameters for a particular scenario?" Tufte’s example of dot-dash-plot [Tuf83, p.133], which combines the bivariate distribution (scatterplot) in the center with marginal distributions along the axes, is an example of a compound visualization, as is Tufte’s example of a Java railroad timetable [Tuf90, p.24]. Thus the number of dimensions is not a useful measure particularly when comparing between alternative compound visualizations.

    This measure, the maximum of the number of dimensions from each separable representation, calculates the number of dimensions for each separable representation based on the task (i.e. the typical task, or the most complex task if the visualization solves many tasks).

    For example, a six dimensional data set can be represented with 3 separate scatterplots. If only the two dimensions within a scatterplot are compared at any one time, the resulting maximum score is 2. A task requiring comprehension of all six dimensions simultaneously requires correlating between the 3 scatterplots simultaneously, and has a score of 6. Thus, the "Maximum of the number of dimensions from each separable task representation" is dependent on the definition of task.

    Table 6-2 lists the maximum of the number of dimensions from each separable task for many visualizations.

  • Connotative representations + dimensional score
  • Neither of the above "dimensional counts" considers the effectiveness of the representation. An effective mapping of data dimensions into a visual representation requires little explanation. A poor mapping results in a visualization that requires repeated explanation. For example, connotative mappings are more readily understood and remembered than non-connotative mappings. Some visualizations at Visible Decisions have been made worse as a result of a redesign because connotative representations were not considered.

    The following model, created by the author, is a simplistic model of cognitive complexity based on a scoring scheme. Visualizations are evaluated for the effectiveness of the mapping of each data dimension into a visual dimension (e.g. length, width, hue, position, orientation, shape, etc.). This simplistic generalization does not take into account many factors but:

    Dimensional Score:

    The scoring is measures the effectiveness of the mapping from the data dimension to the visual dimension. A mapping fits into 4 categories (from worst to best):

    Each mapping has an associated score, based on a simple model of the cognitive effort to recall the mapping. For example, the 1-to-1 general mapping has a cognitive score of 2: 1 point for the user to recall the data dimension and 1 point for the user to recall the visual dimension that it maps to. The sum of all the scores for all the dimensions within a visualization is the dimensional score.

    The desired result is the lowest possible score.

    Many-to-one mapping:

    Some of our visualizations overloaded the hue dimension in the visualization. Different hues would represent positive and negative values, different data types, etc.

    Figure 4-26: Example of overloaded hue dimension.

    Image by the author.

    Reliance on one visual dimension to convey multiple data dimensions has a severe negative impact on the effectiveness of the visualization . In some cases we were required to redesign the mapping.

    We assign the n-to-1 mapping a score of 3 x n, assuming a simple cognitive model of:

    multiplied by the number of data dimensions, since this recall is required for each data dimension.

    One-to-one general mapping:

    This is a common mapping found in visualizations. For example, profit maps to height, or priority maps to color, etc. The score is 2: 1 for data dimension recall, 1 for visual dimension recall.

    One-to-one intuitive mapping:

    Some data types map more appropriately to some visual dimensions than others. We assume that data dimensions mapped to visual dimensions with similar connotations are easier to recall. For example, a data value "size" maps to the visual dimension "size" well. Or spatial data dimensions map well to spatial visual dimensions. We assign this mapping a score of 1: the visualization dimension is implied by the data dimension and does not need to be recalled explicitly.

    Preexisting understood representation:

    A widely understood visual representation with its frame of reference is a representation that is automatically understood within the task domain. For example, latitude and longitude overlaid on a map is automatically understood. Depending on the task and the user community, we also find other representations are automatically understood:

    We assign these a score of 0; e.g. there is no cognitive effort to recall that a map is a map.

    Scoring Summary:

    Mapping

    Score

    n-to-1

    3 x n

    1-to-1 general

    2

    1-to-1 intuitive

    1

    preexisting representation

    0

     

    Stock Visualization Example: A stock visualization was redesigned to reduce redundant geometry. We showed both visualizations to various people with a reasonably consistent reaction. Most people comprehend the information in stock visualization A and are able to pick out meaningful patterns without prompting:

     

    Figure 4-27: Connotative stock representation.

    Image by author. <img src=’sp500good2.gif">

    While many people viewing visualization B struggle to maintain the mapping:

     

    Figure 4-28: Arbitrary stock representation.

    Image by author. <img src="sp500bad2.gif">

     

    The dimensional scores for these visualizations are summarized in the following table:

     

     

    Data

    Visualization A

    (2 cubes + line)

    Score A

    Visualization B

    (1 cube)

    Score B

    Stock price

    Height of line

    1

    Height

    1

    Buy price

    Vertical position of red cube

    1

       

    Sell price

    Vertical position of blue cube

    1

       

    Buy size

    Size of red cube

    1

       

    Sell size

    Size of blue cube

    1

       

    Total volume

       

    Width

    2

    Block volume

       

    Depth

    2

    Spread

    Distance bet. cubes

    1

    Hue

    2

    Liquidity

    Size of both cubes

    1

    Brightness

    2

    Stock

    Location

    2

    Location

    2

    Total score

     

    9

     

    11

     

    Even though the two visualizations did not contain the same data, the result was counter to original intuitions. Visualization B contained less data and less geometry yet was more difficult to understand. Even if one considers the common components between the two visualizations (price, spread, liquidity, and stock), the score for B is worse (7 vs. 5). The result in B was more difficult for the user to understand because simple connotative mappings had been replaced with constraint to minimize the extraneous geometry and this resulted in a complex cognitive mapping.

    This measure does not adequately capture complexity across very different visualizations. The ordering that results from this score does not correspond with subjective interpretations of complexity for the same set of visualizations: thus it cannot be generalized (i.e. it is not a metric [Fen96]). This metric does not scale across visualizations, particularly when comparing across compound visualizations (section 4.3.5.4.2). A compound visualization made up of many simple fairly well understood components may have a high cognitive mapping score because there are many components, while the overall visualization and each of the components may be quite simple.

    This metric may work well for comparing small multiples (section 4.3.2), since all the visual components of a small multiple occur within a limited physical space and represent various different data attributes of the same underlying object. By applying this metric to small multiples, the confounding factor of compound visualizations is removed.

    Further research into this metric will be valuable; for example, the naïve point scoring system could more closely mirror the subjective interpretations if the dimensional scores were not assign arbitrary values but appropriately weighted.

    Table 6-2 in the appendix lists the mapping score for many visualizations.

  • 4.3.6 Occlusion
  • Complete occlusion of data objects hides information from the user and is undesirable. Occlusion is a fundamental issue with 3D representations: since the 3D scene is composed of objects and can be viewed from any angle, there will be viewpoints wherein some objects restrict the visibility of other objects behind them. Complete occlusion occurs when one object completely obscures another object, and partial occlusion occurs when one object only partially obscures another. Complete occlusion of significant parts of the scene is not desirable. Often 3D navigation of the scene to alternative viewpoints is done precisely to look around or behind objects, and navigation may continue until a particular task has been addressed. However, a scene dense with many objects can be difficult to navigate to an optimal viewpoint and some occlusion will occur. If there is too much occlusion it becomes difficult for the user to make judgements about the data. Thus, a measure of occlusion may be worth considering:

    Occlusion percentage: number of data points completely obscured / number of data points

    In most visualizations, optimal viewpoints are chosen before presenting or publishing the resulting image. Typically these viewpoints do not have much complete occlusion: the intent of these images is to communicate and a viewpoint with less occlusion is better able to convey a message based on the data. Thus the occlusion percentage should be measured with respect to a desirable or commonly used viewpoint.

    It is desirable to aim for 0% complete occlusion where all the data is of interest, such as would be found in most information visualizations. Typical values may range from 0 to 10% occluded for an information visualization at Visible Decisions. The actual threshold depends on the application; for example, a system control visualization requires close to 0% occlusion, while a volume visualization, such as an image of a cloud, will always have a back side occluded, thus 50% or more may be occluded. Similar values for occlusion percentage can be seen for a variety of 3D visualizations in Table 6-3 in the appendix.

    Partial occlusion must be considered separately from complete occlusion. Since a visual object may redundantly encode data through multiple visual attributes, it is feasible for an object that is highly but not completely obscured to be reasonably interpreted. For example visual objects encoding the same variable using both color and height can still be reasonably interpreted with partial occlusion based on evaluating the color.

     

     

    Figure 4-29: Partial occlusion with sufficient redundancy.

    Image by author. <img src="occlusion1.gif">.

    The above image shows a bar chart using cubes where length, width, height and color of each bar are data dependent (with the bars sorted by height). The result is a visualization with little complete occlusion and a lot of partial occlusion. The use of cubes results in redundant geometry that from many viewpoints provides enough partial information to establish the data.

     

     

    Figure 4-30: Progressive partial occlusion obscuring information.

    Image by author. <img src="occlusion2.gif">.

    This metric does not address partial occlusion. The above figure shows a relationship diagram. The endpoints are visible in all cases, but as the obscuring object becomes larger, the certainty of the relationships degrades. A weighted measure based on percentage of partial occlusion would capture the effect in the above image but would also capture the effect of partial occlusion of the bar chart image above which contains a significant amount of redundant geometry. The negative impact of partial occlusion is highly dependent on the scene in which it occurs. The LME visualization (section 3.2.3) has some partial occlusion, but is much more difficult to comprehend than the partially occluding bars shown above.

     

     

    Figure 4-31: Overlapping layers comparison.

    Image by author. <img src="occlusion2.gif">.

    Both images are from the same viewpoint onto the same scene with the same small multiples. The scene on the left has a compressed vertical axis for the small multiples, resulting in very little spatial overlap between adjacent small multiples. The scene on the right has a stretched vertical axis for the small multiples, resulting in many overlapping small multiples and greater difficulty in the perceptual discrimination between small multiples.

    An alternative metric would be to maximize the number of simultaneously overlapping layers. It is hypothesized that complex visualizations will have many simultaneously overlapping layers. The multiple overlapping layers burdens the user’s cognitive processes with visually tracking and separating the different layers. As a contrast, a well designed small multiple spatially separates each small multiple, thereby avoiding the overlapping layers of objects and reducing the cognitive load, as can be seen in Figure 4-31 above. A measurement of overlapping layers is a topic for future research.

  • 4.3.7 Legend, scale and annotation
  • Visualizations require learning. A roadmap with a dense display of information requires one to consult the legend, scales and annotations. Similarly, a visualization requires devices to help the user decode the representation. Although separate documentation and expert users can explain a visualization, these references are not a direct part of the visual representation and can become separated from the visualization. Legends, scales and annotations provide an immediate graphical reference that can be used in context to understand the information represented. Without these, printouts are not useable by someone unfamiliar with the visualization. Some users when confronted with a new visualization will not initially interact with the visualization - this renders all interactive based annotation useless.

    Legends are important, particularly for complex visualizations, such as SP500 minimalist representation (section 3.3). The best location to place a legend within a scene has always been a challenge, since the user is free to move about the scene and as a result the legend may not be within the field of vision. Legends have evolved over time at VDI and since 1997, legends typically are located as an overlay in a corner of the window viewing onto the 3D scene. This innovation permits the legend to be constantly visible, without reducing the window area for the visualization. An example of such a legend can be seen in the upper left and upper right corners of the 3D scene in Figure 3-19.

    Brushing can be utilized for these roles. Some VDI visualizations have used brushing to show some legend information, such as the current axes connections at the bottom of the brush in SeeIT version 1.1 (section 3.12 and the image below). If brushing shows the data values that map to the visual attributes for the brushed object, the brush can also provide basic scale information.

    Placing scales directly in the scene follows from 2D charting, which many users are already comfortable with. In cases where different portions of the scene occur at different scales, it is desirable to make these scales explicit to avoid confusing the user.

     

    Figure 4-32: Scene Annotations (in SeeIT).

    Image by author. <img src = "seeItLegend1.gif" >

    A sample scene from SeeIT version 1.1 in use. Note the large amount of text around the scatterplot:

  • 4.3.7.1 Labeling in charts and scientific visualization
  • Tufte [Tuf 83, 90, 96] includes numerous recommendations for the use of text within information graphics including:

    In Graphing Statistics and Data: Creating Better Charts [WWPJH96], the authors consider annotation an inseparable component of a chart and include recommendations such as:

    Foley and Van Dam [FV82] recommend the proper use of labeling to minimize memorization. Avoid forcing the user to memorize information such as commands or legends. Al Globus and Eric Raible [GR94] believe legends and annotations are required for scientific visualizations.

    Note that many of the recommendations for the use of annotations suggest placing annotations directly beside the intended objects reducing reliance on legends to carry too much mapping information and burdening the user. This concurs with Larkin and Simon’s [LS87] determination that the use location to group common information about a single element frees the user from matching symbolic labels.

  • 4.3.7.2 Guideline: Use legends, scales and annotations.
  • Legends, scales and other annotations are an important aid to both novice users for learning a visualization and to casual and expert users for reference.

    Justification: Annotation, legends and scales are important throughout the history of information graphics. At Visible Decisions, all visualizations have included some labeling and the amount of labeling has generally increased over time. There are many possibilities for the placement of these labels within a 3D visualization, ranging from aligned labels in 3D space, to horizontally aligned labels overtop the 3D scene, to interactive labeling, such as brushing.

  • 4.3.8 Illusions and Misinformation
  • Perspective, computer graphics hardware, juxtaposition and perception can create artifacts that can be distracting or even misinterpreted as significant. Common examples are:

    1. Moiré: Dense grids, many parallel lines or parallel long thin bars in 3D can create moiré patterns on computer graphic screens. Moiré distracts the user to the artifact not the information. Interactions such as filtering or scaling can overcome moiré, as can redesign to avoid dense grids, lines and long thin bars.

     

     

    Figure 4-33: Moiré patterns in grids.

    Image by author. <img src="moire2.gif">.

    1. Alignment and overlap: Objects can align or overlap when viewed from a particular viewpoint and confuse the user into seeing a single object instead of 2 unique smaller objects. Using objects with borders, depth, or different colors creates visual boundaries for each object.

    Figure 4-34: Overlap without distinction.

    Image by author. <img src="redundancy1.gif">

    1. Break-up: long thin polygons may appear to be multiple polygons when viewed from a distance. Lines rarely do. This is an artifact of 3D graphics rendering libraries. One solution is to draw lines around polygons that have been determined may become long and thin.

     

    Figure 4-35: Breakup of thin polygons.

    Image by author. <img src="moire1.gif">

    1. Color interpretation: The perception of a color is influenced by the colors around it, as has been shown in a number of experiments, such as the example of the two small squares in fields of different color below. To avoid this effect, colors with no or little saturation should be used for large areas such as backgrounds.


     

    Figure 4-36: Color interpretation biased by surrounding color.

    Image by author. <img src="illusionColor2.gif">

    The two small squares are identically colored, however, they are not perceived
    to be the same – the surrounding field of color biases the interpretation.

    Moiré was a frequently recurring problem in VDI visualizations. Typically, moiré occurs in visualizations with dense grids or cubes. The solution is to:

    Overlap can be addressed by use of redundancy as described in section 4.3.5.1.

    Break-up does occur in some early versions of VDI visualizations such as inventory viewer (section 3.1) and timeseries (section 3.9). Drawing an outline around the shapes that may become thin and elongated, such as bars, can rectify this:

     

     

    Figure 4-37: Outline around thin shapes.

    Image by author. <img src="wireOutline1.gif"> <img src="wireOutline2.gif">

    Outlines have also been used in other graphic domains in the past:

    Color is addressed in the next section.

  • 4.3.8.1 Guideline: Avoid moiré, excessive overlap, breakup and color misinterpretation.
  • Moiré, overlap, breakup and color misinterpretation can cause confusion and misinformation. Careful design of a visualization can minimize these effects. Although these effects are not addressed frequently in visualization research, guidelines exist for 2D information graphics such as Tufte [Tuf83] (particularly for moiré and grids).

  • 4.3.9 Color
  • Color is a complex design element. Color can be used in many different ways: discretely, as a single continuous dimension or as two or three continuous dimensions. Color is NOT supported in detail in this thesis. Color is a sufficiently complex topic to warrant detailed analysis and guidelines.

    As general guidelines, from experience at VDI, color should be used carefully and tested with each visualization with the user population. In particular:

  • 4.3.10 Visualization Conclusions
  • The effectiveness of a visualization is influenced by the design and organization of the visual representations of the data. As illustrated in the above section, a wide variety of factors impacting visualization effectiveness are addressed with corresponding guidelines, including:

  • 4.4 Interaction
  • Interaction is a key difference between charts and visualizations. Interaction permits the user to manipulate the visualization to find and identify patterns visually while a chart is merely a static mapping of data to a representation.

  • 4.4.1 Do not rely on interaction
  • Visualizations that rely on interaction to be comprehended cannot be printed out, published, etc. Users may not be comfortable with interactions until they gain confidence with the visualization. Dependence on interaction limits the audience to a small subset who can use the visualization in an interactive environment and are comfortable with interacting with the visualization. Information visualization often needs to be disseminated to a wide audience, even if the application was originally designed for a few select users. Thus, any information visualization should not rely on interaction.

    A 3D scatterplot requires rotation of the scene so that a user can differentiate between points that are near and far. As a result, any printout of a 3D scatterplot is of limited value. VDI has tried various other cues (brightness, color, size) to illustrate depth, but in general we have found 3D scatterplots ineffective.

    This result of this is encapsulated in various other guidelines, such as the need for a reference context (section 4.3.3); legend, scale and annotation (section 4.3.7); and brushing (section 4.4.4), a very easy use and highly valuable interaction.

  • 4.4.2 Interaction is required to explore data sets
  • Interaction permits the user to work with a much larger data set than can be presented on the screen at once. By drilling-down, animating, changing axes or adjusting the data model, the user can explore a data space orders of magnitude larger than can be assembled into a singular 3D scene. For example, the decision support (Figure 3-18) and the inventory viewer (Figure 3-1) visualizations contain interactive "what-if" capabilities: to pre-compute and display all combinations and permutations would result in millions of possible alternatives; whereas interaction enables the user to explore any arbitrary "what-if" scenario on the fly.

    Interaction also permits the user to remove data from the display. By slicing, filtering, zooming and querying the data the user can quickly narrow a search through the information. All visualizations described in the previous work include interaction. All VDI visualizations use interaction. Stuart Card of Xerox PARC, a respected authority on information visualization, contends that information visualization cannot be considered out of context of the interaction. Bill Wright, another authority, contends that 3D visualization is of no value without interaction: since 3D visualization maps data into a visual representation, issues such as occlusion, uncertainty about the mapping, insufficient color ranges, etc, are possible and these can be overcome with appropriate interactions.

  • 4.4.3 Simple navigation of the scene is imperative
  • Changing the viewpoint permits the user to:

    a. See around occlusions.

    1. Disambiguate uncertainties in the scene, such as an alignment of two different lines, or two objects of the same shading overlapping from a particular view.
    2. Zoom in for details and zoom out for context.

    Many 3D systems (and most VRML browsers) suffer from poor 3D scene navigation. Within the first 2 years of VDI’s existence the navigation model was identified as a key component to making visualizations successful. VDI’s initial navigation interface was slow and hard to use due to the use of a virtual trackball interface (discussed in section 3.1). An information visualization navigation model should:

    a. Keep the scene always on screen. It should never let the user manipulate the scene so that the user is looking at empty space. Most 3D navigation models do not accommodate for this nor do most VRML viewers. This puts significant burden on the user to maintain control over their current view and navigate in such a way that they can navigate back to where they had just been. Many 3D navigation models in fact do not permit simple means of backing up or undoing navigation (a flight simulator interface does not allow the plane to fly backwards).

    b. Use a steady-cam model. Navigation should not permit the scene to roll; i.e. the horizon should remain horizontal. Information is encoded in a space where the viewer perceives and associates attributes with directions such as up-down - an assumption that follows from real-world experiences (i.e. most people do not experience roll for most everyday tasks). Permitting the scene to roll destroys encodings such as up-down and as a result the task becomes more difficult to perceive. Thus, flight simulator interfaces are inappropriate.

    1. Provide consistent interaction. Virtual trackball interfaces (e.g. OpenInventor and many VRML browsers) rotate the scene differently depending on the initial location of the mouse down event. As a result, the user has different behavior based on what seem to be similar interactions, e.g. a click and drag horizontal is interpreted by the user as the same interaction, but the system responds differently based on the initial mouse down event.

    Figure 4-38: VRML browser with chart rolled upside-down.

    Image capture of Microsoft ® Internet Explorer ® with chart generated by author. <img src="vrmlViewer1.gif">

    Most VRML viewers, as of 1997, do not keep scenes on screen, use steady cam models nor provide consistent interaction. Here a bar chart has been rolled using a virtual trackball interface. The user can always click "home" to get back to the starting view - but what if the start view isn’t the one you want and there’s no easy way to get to the one you want or go back to one you had?

    1. Provide fast feedback. A user can more quickly orient a scene if provided with a subset or proxy for the scene than if the user must wait until the entire scene redraws. Fast feedback also provides the user with confirmation that the visualization is actually working. Early visualizations done at VDI had redraw responses that took as long as 1 second: an impatient or novice user would over compensate, assuming that the small motion did nothing, and as a result exaggerated the navigation to such a degree that navigation was impossible.
    2. Provide alternatives to mouse navigation. The mouse is a complex pointing device: some users are not adept at using a mouse. Providing keyboard shortcuts, GUI buttons or context menus for most navigation techniques provide reasonable alternatives.
    3. Provide viewpoint persistence. Once the user can navigate to some desired location, presumably the application should enable the user to easily return to that viewpoint.

    All of the above have been incorporated into the default navigation for Visible Decisions applications. Further, since novices may feel uncomfortable navigating a 3D scene with a mouse, or may be unaware that navigation is feasible with the mouse, most VDI visualizations (after Risk Movies, section 3.10) also contain buttons in the GUI interface for easily navigating the 3D scene to fixed points of view.

    Usable 3D navigation is critical to making CAD systems useful. Commercial 3D modeling systems where navigation is a core component to the system’s usability (section 2.4.3), such as Alias Studio, have incorporated many navigation interactions described above. Fred Brooks (section 2.4.4.1) pioneering work in virtual reality outlines a number of navigation issues. Virtual reality’s initial appeal may have been largely due to the "natural" navigation interface (head and body movement correlate directly to movement in the 3D scene - section 2.4.4.2).

  • 4.4.4 Brushing: User must be able to drill-down to underlying data
  • An information visualization presents data as a visual representation. Many users require the underlying data as a part of routine analysis of the information. Brushing is a general technique of pointing at graphical objects and quickly invoking quantitative feedback on that object. A commonly implemented form of brushing is a transparent textual overlay of detailed numeric values which appears at the location where the user is currently moving the mouse. With the popularization of graphic displays, Foley and Van Dam [FV82] recommended in 1982 that feedback and error messages should be positioned close to the work area or cursor and not in a separate message area. Feedback in separate message areas is still common in a number of commercial 2D and 3D visualization software packages. By placing the feedback close to the cursor, visual continuity for the user is maintained as opposed to forcing the user to switch from work task to message task. The first well-known form of brushing appears as part of the built-in help in graphic user interfaces (GUI) as "Balloon Help" (on MacIntosh computers) and as "ToolTips" (on Microsoft Windows computers).

    In the domain of visualizing information, an early proponent of brushing was Cleveland [Cle93], who used brushing for identification in various 2D charts, such as this identification of an outlier in a scatterplot:

     

    Figure 4-39: Cleveland's brushing

    Image by author based on [Cle93]. <img src="clevelandBrushing1.gif">

  • 4.4.4.1 Identification
  • Edward Fowlkes in 1969 [Cle93] discovered brushing as a valuable tool for identifying outliers in scatterplots. Cleveland uses brushing in a variety of different ways, although all his examples show brushes that present one line of text for identifying individual objects or groups of objects. Visible Decisions has used brushing to provide more details on each object - for example, all the attributes for an object may be displayed in a brush.

     

    Figure 4-40: Example brush in SeeIT displaying data and attribute mappings.

    Image by author. <img src="seeItBrush1.gif">.

    This can be useful to:

    Brushing can also be used to present the original data in its original format, whether, numeric, text, image or other media.

    Brushing for identification frees the user from symbolic label matching, particularly when numerous or complex attributes exist for an object (i.e. looking at an object, then looking at legend or table of codes). Larkin and Simon [LS87] identified the avoidance of symbolic label matching as one of the key benefits of graphical representation over sentential representations (section 2.2.5).

    TableLens (section 2.5.2.6) can embed labels directly in the scene depending on user interactions similar to brushing. Some implementations of Hyperbolic trees (section 2.5.2.2) include brushing in the scene, or in a status bar at the bottom of the window.

  • 4.4.4.2 Verification and Education
  • Brushing is valuable for a number of reasons.

    First, brushing can be used in a very simple "tool-tip" manner: point at an object and receive a few words of help about that object (for example, a statement regarding the mapping of that object). With the addition of quantitative data, the user can validate the correctness of the data and verify his/her cognitive mapping of the data to the visualization. Since brush feedback is quick, the user can brush a number of different data points and compare the actual data revealed by the brush to the visual presentation. Initially, the user can do this to create and validate a mental model; for example, to validate values below zero map to red and values above zero map to green. As the user learns the model, the brushing and mental comparisons are replaced with learned rules (red is negative; green is positive).

    As identified by Larkin and Simon[LS87], learning to use a diagram precedes the effective use of a diagram. Learning is a process of being able to commit a set of rules to long term memory to form an instantiated graph schema [Loh91]. The process of brushing enables the user to acquire and validate a mental model for a graphical display of information. Observations with actual users with actual data will show what appears to be counter-intuitive behavior: presented with a new visualization, many users will initially search for data points that they know, validating what they already know. It appears that the user is verifying the graphical model with the knowledge that he already has, in effect creating and validating a mental model for the graphical display.

    For example, in a visualization done for a major bank review of credit policy, a visualization was created with layers of drill-down starting with a choropleth map (colored regions) and drill-down to various histograms of different risk. The initial choropleth map used the same small multiple as the detailed drill-down and was created primarily as an interface for selecting different data subsets for the drill-down. In actual use, however, users spent time with the initial choropleth map validating the representation of the small multiple with their existing understanding of the overall data before proceeding to drill-down. When the application was reviewed by the CEO, the CEO first reviewed the high level map ("Take a look at California - I heard they did well.") before moving on to detailed analysis.

  • 4.4.4.3 Guideline: Use brushing for drill-down information and education.
  • Brushing is invaluable for making visualization meaningful. It can provide the underlying details behind the visual attribute for identification and verification as well as providing helpful descriptions for aiding the user. This information helps the user build a mental model of the visualization. Lohse [Loh91] and Larkin and Simon [LS87] provide cognitive models supporting the effectiveness of labels in context. Information graphics professionals recommend the use of labels in context whether in an interactive manner as a brush in the visualization domain (e.g. [FV82]); or as static text in the 2D graphic domain (e.g. [Tuf83, WWPJH96]).

  • 4.4.5 Searching for Thresholds and Individual Items
  • As a corollary to the above "Brushing", a user may be interested in finding one data point (e.g. "Where is item x in this scatterplot?") or a range of data points (e.g. "Where are all the items > 10?"). These actions may be directed, "I need this item", exploratory "I wonder where this occurs in the dataset?" or also for verification "I know this obvservation is close to average, let’s verify how the visualization shows that." These questions must be accommodated in some way:

     

    Figure 4-41: Filtered subset of data in SeeIT.

    Image by author. <img src="seeItFilter1.gif">.

    Scatterplot of the Fortune 100 companies. All the companies with a positive growth (Rev_Pct_Change) are above the red line. A filtered subset (all companies in "Life Insurance") are shown with yellow wireframe boxes around them.

     

    The filtered subset can then be displayed in the interface exclusively (when the other data points are not of interest) or differentiated from the non-selected data in some way such as dimming out the non-selected data points (when the context from which the subset is selected is important).

    Filtering can also be applied in different ways. For example, filtering can be used to remove all graphic objects that do not meet a particular criteria in the display (for example remove all the cones and flags for objects not having attribute X > 100). Filtering can also be used to remove a graphic entity from each of the graphic objects (small multiples) in the display (for example, remove the flag off the top of each cone because that attribute is not important to the current analysis).

     

    Figure 4-42: Filtered visual attributes in SeeIT.

    Image by author. <img src=seeItFilter2.gif>

    Same Fortune 100 scatterplot as before. Flags on top of cones have been turned off.

    Cleveland previously identified filtering as a valuable interaction within multivariate scatterplots, enabling the correlation of variables to a specified subset of data.

    Justification: Finding thresholds and individual items is another means of verification; as well as identification and exploration. These interactions permit the user to:

  • 4.4.6 Interaction permits unforeseen combinations and permutations
  • Data used in information visualizations is often multivariate: interaction permits various queries across multiple dimensions in the data simultaneously. The data can often be analyzed in many ways (e.g. summaries, differences, averages, ratios, distributions). The data may have various different states, for example, a real-time visualization for stocks requires different representations during trading hours vs. after trading hours. Alternatively, the data may have wide variance in any given state; for example, a slow stock trading day where stocks change less than a half percent vs. a day when most stocks lose 10 % of their value.

    Another example where interaction is of value is in perception. Cleveland has found that line charts often are most revealing when the lines within the chart are close to 45 degree angles, regardless of the aspect ratio of the overall chart (Figure 2-10). Simple interactive scaling would permit the user to adjust the heights based on the data thus permitting the user to find perceptually revealing proportions on their own.

     

  • 4.4.7 Interaction conclusions
  • Interaction is required by 3D visualizations to enable users to:

    At the same time, the visualization should not be so dependent on interaction as to make still images of the scene ineffective since interaction does not exist in a number of output mediums.

  • 4.5 Guideline and Metric Conclusions
  • 3D information visualizations can be poorly designed. They may use inappropriate data or may be poorly planned. They may suffer from poor organization, lack of reference context, or excessive occlusion. Visualizations may be confusing or difficult to remember due to poor mappings of data attributes to visual attributes. Case studies and prior work from various disciplines creates a body of knowledge that is used to inform the creation of visualization through guidelines and metrics. These guidelines and metrics address essential project issues of task knowledge and data, visual design, and interactive considerations.

    Task knowledge is the problem domain that the visualization seeks to address. Success of a visualization requires a clear comprehension of the task domain. A clear goal is required to determine whether the visualization has been successful. The goal definition should outline the requirements of the tasks in support of the goal, including decision, communication, information, workflow and existing artifacts. To support these tasks, the appropriate data must be identified, including supplementary data, calculations and special calculations. The data should be characterized (by type, i.e. categorical, ordered, or quantitative), organized into dimensions and ranked. This will assist with design decisions in the creation of the visual design.

    Visualization design is the mapping of the problem domain into visual representations that help the users achieve their goal. The overall visualization can be effectively organized using a few techniques, such as using a layout that the target users are already familiar with, or based on a few known organizational principles, or based on the ranked dimensions from the task analysis. At a detailed level, the use of connotative small multiples effectively conveys multiple data attributes. A suitable (and measurable) reference context is required to reasonably locate any item in the scene and understand in relation to its context. Careful use of redundancy and visual dimensions can improve the clarity and comprehension of the visualization. Occlusion and overlap should be reduced to improve readability and reduce confusion. Legends, scales and annotations provide assistance for learning and reference. Illusions, such as moiré should be avoided to remove confusion. Color is a complex design variable with many different possible uses and should be used with care.

    Interactions are techniques for exploring the visualization with immediate feedback. Navigation is required to move around 3D scenes, for example, to move around partially obscuring objects or zoom close to an item of interest. Brushing reveals the underlying data values represented by a marker within the scene and is extremely valuable for learning and identification. Filtering and other interaction permits exploration of subsets of data or other permutations and combinations of data.

    The visualization developer is thus aided with the encapsulation of knowledge and previous work with this set of guidelines and metrics for tasks, representations and interactions.

    --------------------------

    Next Section Conclusions

    © Copyright by Richard Karl Brath 1999