top of page

Helping Users Navigate Readily By Employing Quantitative Metrics


UX Researcher

(Team of 4)



Optimal Workshop

Microsoft Office Suite



User Research

Card Sorting

Tree Testing



Distance Matrix

Data Analysis


4 months

(Jan 2022 - May 2022)


The study aimed at improving the organization of information on the Stanford University Website. It attempted to review the website for navigation and usability issues and make recommendations based on the results of the card sorting and tree testing exercises. For the variety of users seeking information, we believe that these organizational improvements will lead to a more usable and valuable experience.


Navigating the website tends to be challenging owing to the confusing organization of information

It feels disjointed due to the ambiguous options and categories. Oftentimes, users cannot easily find the information they are looking for. was chosen for the vast amount of content that can be overwhelming simply by the nature of its size



The purpose of this study was to find out how people organize the institutional content on the Stanford University website

To suggest improvements to the information architecture to improve the usability and effectiveness of the website



Reviewed the organization of content  and made recommendations for restructuring the website.

Evaluated how the users would place content by conducting a closed card sort. Based on the results of the card sort, a tree test was designed to determine which information structure is perceived as more usable.


Site Review
Card Sort
Matrix Analysis

Website Review

The team conducted a review and analysis of the existing site architecture

To find all possible labels and categories

After conducting an open card sort of the existing information on Miro

The team decided to narrow down the categories and cards

Eventually, 45 cards or labels and 7 categories were selected

Based on the initial open card sort (internal to the team), it was decided to conduct a closed card sort with participants, consisting of 45 cards under 7 categories

Card Sort

An unmoderated closed card sort was conducted using Optimal Workshop

  • Since the test was closed, the participants were not able to add their own labels.

N = 25

The participants also answered 3 pre-questionnaire and 4 post-questionnaire questions.

  • The goal of these questions was to better understand participants' familiarity with university websites, gather demographics, and determine if the categories selected were sufficient.

  • Two of the questions were open-ended in order to gather qualitative data for analysis.

Hierarchical Cluster Analysis

Dendrograms generated using RStudio

Were chosen to analyze the data from the card sort given the nature of the data

The dendrogram generated using the average method was finalized

  • After examining 10 ways to build the dendrograms, hierarchical clustering using average linkage was chosen as the optimal dendrogram.

  • This dendrogram was chosen because of its clear distinction of “About” cards focusing more on the information on the university's history and facts, whereas “Resources” focused more on external resources or additional information

The dendrogram was cut at a separation height of 3 to divide the cards into 7 groups/ categories

  • This aligned with our 7 categories of closed card sort that we started with but also gave the best representation.

  • This keeps the number of cards reasonable in each category for a lighter cognitive load. It is also the best at keeping the related cards together at the right distance based on our predetermined categories.

Multidimensional Scaling (MDS)

From the MDS with no clustering, some distinctions could be seen at a glance.

The MDS with clustering showed patterns similar to the non clustered version.

  • Items around the top right corner were also close together and likely to be a group of their own.

  • Items on the left side of the graph were a bit hard to distinguish and group.

  • Some of these could be grouped together in different ways with surrounding items.

  • Items that fell into groups like a - light blue (bottom right corner) or 6 - teal (top right corner) are clearly distinct from other groups and naturally grouped together.

  • The much closer items on the left were divided into three cluster groups: group 4 - red, 2 - yellow, or 7 - brown.

Participants likely found the items in the left area as the hardest to categorize

  • These cards were very close in proximity and were the biggest area of variation that correspond to the dendrograms as well.

Based on this graph, our analysis based on the dendrograms was confirmed.

  • Cards that were grouped together on the left that relate to “About” or “Resources” categories were key areas of divided categorization.

Matrix Analysis

The Distance Matrix confirmed the findings from HCA

The distance matrix visually represents the areas of consideration for tree testing.

The distance matrix represents where participants had trouble grouping cards

  • These are represented by the lighter blue squares forming the larger box at the centre.

  • Many of these cards were split between different categories.

  • The smaller darker blue box clusters represent where participants were consistent in the card grouping.

  • There is also additional fuzziness in the data with the lighter blue sections that require digging into- with our correlation matrix.

14 cards were identified that were split between more than one category

  • Based on the Distance Matrix and the Popular Placement Matrix from the Optimal Workshop.

  • These 14 cards fell under 50% in the first category and the second category percentage was even lower.

  • This means that participants were split on under which categories these cards should fall.

Correlation Matrix

The spreadsheet from Donna Spencer’s workbook was used

  • To obtain the correlation matrix to further support decision-making with respect to categorization.

  • The matrix shows the percentage of participants that grouped cards under specific categories.

  • From this matrix, the team could clearly which cards were harder for participants to sort and need to be explored further.

The lighter blue colors and lower numbers signify a lower correlation

  • Cards such as “A to Z index”, “Campus Map”, “Careers”, “Alumni”, “Public Safety”, and “Libraries” are a few items that fall within this.

  • For example, A to Z index (card 15) is split between “Resources” and “About”. This indicates that participants were divided on where this card should go.

Grouped cards and categories

There are 4 groups that are clearly grouped together

  • While a few are split between “Graduate” and “Undergraduate”, these categories are more straightforward, given that they are specific programs within the greater school umbrella.

  • Stanford should consider clarifying in the name if the school is Undergraduate or Graduate. For example, “Stanford Engineering” was split between Undergraduate and Graduate, and this card could be renamed to indicate which category it belongs in.

The 14 cards were more challenging for participants to group in 3 categories.

  • The categories that these cards commonly are split amongst are “Resources”, “About”, and “Campus Life”. These items were the most significant to take to the next step of tree testing.

  • The red box highlights tentative grouping that needs to be followed up with tree-testing, with a hyper focus on bolded cards that were very split amongst participants

Tree Testing

The tree test was conducted for areas where the participants were the most split

  • These were Resources, About, and Campus Life

  • This allowed us to make an informed recommendation on which categories the five items (Campus Map, Careers, Libraries, Public Safety, and News) should be placed in

Achieved by studying where participants  are looking

  • When prompted with specific tasks

  • Through a between-subjects design

N = 32

(n = 16 for each tree)

The following research question

Was selected as we began tree test

Research Question and Tasks

A task was created for each of the five items highlighted by the card sort analysis

  • That would make the users look for these items within a tree. The outcomes were recorded.

  • The tasks were the same for each tree, but the cards within the trees varied in destination.

The tasks were randomized when delivering to the participants

  • To eliminate order effects

  • All the tasks were administered via Optimal Workshop


Hypotheses were developed for each task

  • The independent variable for each task was the location of the item associated with the task.

  • The dependent variables were lostness, time and success.

To avoid bias

  • None of the participants that took the previous card sort test were recruited for the tree test

  • Each tree had it's own unique set of participants

Tree structures

The entire tree sort structure included all of the original 7 categories

  • Namely Admissions, Undergraduate, Graduate, Research, About, Campus Life, and Resources

  • But the cards tested were only located in the three categories that were determined to have the most issues from the card sort (Resources, Campus Life, and About).

"Did you have trouble finding this item? Please explain."

  • After each individual task, the participant was prompted to answer this qualitative question

  • At the end of the test, the questions filled out demographic information

Data Analysis

RStudio was used to run the statistical analysis.

To compare participant data from the two trees.


A within-subject design test was first conducted for each tree

  • Across the three dependent variables- time, success, and lostness

  • To check participant behavior within each tree

  • To assess the performance of each task.

A Shapiro-Wilk’s test was used

To evaluate if the data were normally distributed.

A Friedman Test was used to compare the variables

Since the data in both trees were not normally distributed

A posthoc analysis (Conover Test) was administered

To determine which tasks were significantly different (For the variables that were significantly different)


Screenshot 2022-07-11 060558.png

A between-subject design test between both the trees

To assess their performance, for each of the five tasks across the three dependent variables.

For visual representation

The data was graphed into a boxplot or pie chart. 

A Mann-Whitney U test was run for comparison

  • Since the data were not normally distributed

  • For each, the data would be summarized first. Then the test was administered between the dependent variables (time, success, and lostness) and the tree, for each task.

Data Summary

Other data from Optimal Workshop was also examined to draw conclusions

  • When there was no significant difference and the data from R was not pointing definitively in one direction or another.

  • This included first click data, visited during data, and overall score from Optimal Workshop

To build the final inferences

Quantitative data were analyzed along with a look at the qualitative data from post-task and post-study questionnaire responses


Screenshot 2022-07-11 072124.png

To summarize, we rejected the null hypothesis for Task 1

  • For Task 1, our hypothesis was incorrect, as participants were less lost when “Campus Map” is listed under Campus Life.

  • Task 5 did not have significant findings for lostness but did have significant differences in time and success.

Findings for Tasks 2, 3, and 4 were not significant

  • But we made recommendations based on the quantitative and qualitative data gathered.

  • We looked at data across the board including but not limited to time, success rate and clicks in order to provide the best recommendation.


The team recommended moving forward with Tree 2

  • With significant results in Tasks 1 and 5 that support this recommendation.

  • Campus Map was most successful when placed under Campus Life, and News was most successful when placed under About. 

For Careers (Task 2) and Libraries (Task 3)

We recommend giving some additional thought and testing, as there were no significant findings and the conclusions were speculative

Overall, Tree 1 performed poorly in the test

With low success rates and high reported difficulty.

For Public Safety (Task 4)

We recommend looking into renaming the category, with a suggestion to try out Campus Safety instead.

After these adjustments, the Stanford website should prove to be more usable

With less time wasted, and less frustration around finding the information the visitors need

Tree Test
Data Analysis



Limitations and challenges

  • One of our biggest challenges during this test was getting the task question wording right. The questions were difficult to formulate in two ways. First, it was difficult to come up with a realistic scenario without using any of the words that appear in our tree, especially the related item. Second, even with an alternative way to ask participants to find the item, the way questions were constructed or the words in them could be easily misinterpreted or bias the actions. We did iterate on the questions a few times but, as shown by the data result, task 3 clearly misled participants due to the wording.

  • Another challenge is the analysis and interpretation of non-significant results. For task 2-4, the data analysis between trees was not significant. It was difficult to analyze the differences between trees when the difference was not significant. While we were able to reference Optimal Workshop’s additional numbers, ultimately, the results can not be completely conclusive.

  • We received rich qualitative data from our post-task questions, but there was a significant difference between Tree 1 and Tree 2. Our qualitative results indicated a potential issue with the integrity of results from 9 participants in Tree 2, as they skipped each post-task question. Since not a single participant skipped a question in Tree 1, it is unclear if this had an impact on our findings. It is possible that this could be an indicator of less integrity in the results from those participants.



  • It can be difficult to describe the task without using the words present in the labels. However, care has to be taken to try and do so, in order to avoid misinterpretation of the meaning of the task and leading users.

  • Non-significant results make it hard to make recommendations

  • It is essential to make requiring a response to each question compulsory to ensure the integrity of the results

bottom of page