Helping Users Navigate Readily By Employing Quantitative Metrics
(Team of 4)
Microsoft Office Suite
(Jan 2022 - May 2022)
The study aimed at improving the organization of information on the Stanford University Website. It attempted to review the website for navigation and usability issues and make recommendations based on the results of the card sorting and tree testing exercises. For the variety of users seeking information, we believe that these organizational improvements will lead to a more usable and valuable experience.
Navigating the website tends to be challenging owing to the confusing organization of information
It feels disjointed due to the ambiguous options and categories. Oftentimes, users cannot easily find the information they are looking for. Stanford.edu was chosen for the vast amount of content that can be overwhelming simply by the nature of its size
The purpose of this study was to find out how people organize the institutional content on the Stanford University website
To suggest improvements to the information architecture to improve the usability and effectiveness of the website
Reviewed the organization of content and made recommendations for restructuring the website.
Evaluated how the users would place content by conducting a closed card sort. Based on the results of the card sort, a tree test was designed to determine which information structure is perceived as more usable.
The team conducted a review and analysis of the existing site architecture
To find all possible labels and categories
After conducting an open card sort of the existing information on Miro
The team decided to narrow down the categories and cards
Eventually, 45 cards or labels and 7 categories were selected
Based on the initial open card sort (internal to the team), it was decided to conduct a closed card sort with participants, consisting of 45 cards under 7 categories
An unmoderated closed card sort was conducted using Optimal Workshop
Since the test was closed, the participants were not able to add their own labels.
N = 25
The participants also answered 3 pre-questionnaire and 4 post-questionnaire questions.
The goal of these questions was to better understand participants' familiarity with university websites, gather demographics, and determine if the categories selected were sufficient.
Two of the questions were open-ended in order to gather qualitative data for analysis.
Hierarchical Cluster Analysis
Dendrograms generated using RStudio
Were chosen to analyze the data from the card sort given the nature of the data
The dendrogram generated using the average method was finalized
After examining 10 ways to build the dendrograms, hierarchical clustering using average linkage was chosen as the optimal dendrogram.
This dendrogram was chosen because of its clear distinction of “About” cards focusing more on the information on the university's history and facts, whereas “Resources” focused more on external resources or additional information
The dendrogram was cut at a separation height of 3 to divide the cards into 7 groups/ categories
This aligned with our 7 categories of closed card sort that we started with but also gave the best representation.
This keeps the number of cards reasonable in each category for a lighter cognitive load. It is also the best at keeping the related cards together at the right distance based on our predetermined categories.
Multidimensional Scaling (MDS)
From the MDS with no clustering, some distinctions could be seen at a glance.
The MDS with clustering showed patterns similar to the non clustered version.
Items around the top right corner were also close together and likely to be a group of their own.
Items on the left side of the graph were a bit hard to distinguish and group.
Some of these could be grouped together in different ways with surrounding items.
Items that fell into groups like a - light blue (bottom right corner) or 6 - teal (top right corner) are clearly distinct from other groups and naturally grouped together.
The much closer items on the left were divided into three cluster groups: group 4 - red, 2 - yellow, or 7 - brown.
Participants likely found the items in the left area as the hardest to categorize
These cards were very close in proximity and were the biggest area of variation that correspond to the dendrograms as well.
Based on this graph, our analysis based on the dendrograms was confirmed.
Cards that were grouped together on the left that relate to “About” or “Resources” categories were key areas of divided categorization.
The Distance Matrix confirmed the findings from HCA
The distance matrix visually represents the areas of consideration for tree testing.
The distance matrix represents where participants had trouble grouping cards
These are represented by the lighter blue squares forming the larger box at the centre.
Many of these cards were split between different categories.
The smaller darker blue box clusters represent where participants were consistent in the card grouping.
There is also additional fuzziness in the data with the lighter blue sections that require digging into- with our correlation matrix.
14 cards were identified that were split between more than one category
Based on the Distance Matrix and the Popular Placement Matrix from the Optimal Workshop.
These 14 cards fell under 50% in the first category and the second category percentage was even lower.
This means that participants were split on under which categories these cards should fall.
The spreadsheet from Donna Spencer’s workbook was used
To obtain the correlation matrix to further support decision-making with respect to categorization.
The matrix shows the percentage of participants that grouped cards under specific categories.
From this matrix, the team could clearly which cards were harder for participants to sort and need to be explored further.
The lighter blue colors and lower numbers signify a lower correlation
Cards such as “A to Z index”, “Campus Map”, “Careers”, “Alumni”, “Public Safety”, and “Libraries” are a few items that fall within this.
For example, A to Z index (card 15) is split between “Resources” and “About”. This indicates that participants were divided on where this card should go.
Grouped cards and categories
There are 4 groups that are clearly grouped together
While a few are split between “Graduate” and “Undergraduate”, these categories are more straightforward, given that they are specific programs within the greater school umbrella.
Stanford should consider clarifying in the name if the school is Undergraduate or Graduate. For example, “Stanford Engineering” was split between Undergraduate and Graduate, and this card could be renamed to indicate which category it belongs in.
The 14 cards were more challenging for participants to group in 3 categories.
The categories that these cards commonly are split amongst are “Resources”, “About”, and “Campus Life”. These items were the most significant to take to the next step of tree testing.
The red box highlights tentative grouping that needs to be followed up with tree-testing, with a hyper focus on bolded cards that were very split amongst participants
The tree test was conducted for areas where the participants were the most split
These were Resources, About, and Campus Life
This allowed us to make an informed recommendation on which categories the five items (Campus Map, Careers, Libraries, Public Safety, and News) should be placed in
Achieved by studying where participants are looking
When prompted with specific tasks
Through a between-subjects design
N = 32
(n = 16 for each tree)
The following research question
Was selected as we began tree test
Research Question and Tasks
A task was created for each of the five items highlighted by the card sort analysis
That would make the users look for these items within a tree. The outcomes were recorded.
The tasks were the same for each tree, but the cards within the trees varied in destination.
The tasks were randomized when delivering to the participants
To eliminate order effects
All the tasks were administered via Optimal Workshop
Hypotheses were developed for each task
The independent variable for each task was the location of the item associated with the task.
The dependent variables were lostness, time and success.
To avoid bias
None of the participants that took the previous card sort test were recruited for the tree test
Each tree had it's own unique set of participants
The entire tree sort structure included all of the original 7 categories
Namely Admissions, Undergraduate, Graduate, Research, About, Campus Life, and Resources
But the cards tested were only located in the three categories that were determined to have the most issues from the card sort (Resources, Campus Life, and About).
"Did you have trouble finding this item? Please explain."
After each individual task, the participant was prompted to answer this qualitative question
At the end of the test, the questions filled out demographic information
RStudio was used to run the statistical analysis.
To compare participant data from the two trees.
A within-subject design test was first conducted for each tree
Across the three dependent variables- time, success, and lostness
To check participant behavior within each tree
To assess the performance of each task.
A Shapiro-Wilk’s test was used
To evaluate if the data were normally distributed.
A Friedman Test was used to compare the variables
Since the data in both trees were not normally distributed
A posthoc analysis (Conover Test) was administered
To determine which tasks were significantly different (For the variables that were significantly different)
A between-subject design test between both the trees
To assess their performance, for each of the five tasks across the three dependent variables.
For visual representation
The data was graphed into a boxplot or pie chart.
A Mann-Whitney U test was run for comparison
Since the data were not normally distributed
For each, the data would be summarized first. Then the test was administered between the dependent variables (time, success, and lostness) and the tree, for each task.
Other data from Optimal Workshop was also examined to draw conclusions
When there was no significant difference and the data from R was not pointing definitively in one direction or another.
This included first click data, visited during data, and overall score from Optimal Workshop
To build the final inferences
Quantitative data were analyzed along with a look at the qualitative data from post-task and post-study questionnaire responses
To summarize, we rejected the null hypothesis for Task 1
For Task 1, our hypothesis was incorrect, as participants were less lost when “Campus Map” is listed under Campus Life.
Task 5 did not have significant findings for lostness but did have significant differences in time and success.
Findings for Tasks 2, 3, and 4 were not significant
But we made recommendations based on the quantitative and qualitative data gathered.
We looked at data across the board including but not limited to time, success rate and clicks in order to provide the best recommendation.
The team recommended moving forward with Tree 2
With significant results in Tasks 1 and 5 that support this recommendation.
Campus Map was most successful when placed under Campus Life, and News was most successful when placed under About.
For Careers (Task 2) and Libraries (Task 3)
We recommend giving some additional thought and testing, as there were no significant findings and the conclusions were speculative
Overall, Tree 1 performed poorly in the test
With low success rates and high reported difficulty.
For Public Safety (Task 4)
We recommend looking into renaming the category, with a suggestion to try out Campus Safety instead.
After these adjustments, the Stanford website should prove to be more usable
With less time wasted, and less frustration around finding the information the visitors need
Limitations and challenges
One of our biggest challenges during this test was getting the task question wording right. The questions were difficult to formulate in two ways. First, it was difficult to come up with a realistic scenario without using any of the words that appear in our tree, especially the related item. Second, even with an alternative way to ask participants to find the item, the way questions were constructed or the words in them could be easily misinterpreted or bias the actions. We did iterate on the questions a few times but, as shown by the data result, task 3 clearly misled participants due to the wording.
Another challenge is the analysis and interpretation of non-significant results. For task 2-4, the data analysis between trees was not significant. It was difficult to analyze the differences between trees when the difference was not significant. While we were able to reference Optimal Workshop’s additional numbers, ultimately, the results can not be completely conclusive.
We received rich qualitative data from our post-task questions, but there was a significant difference between Tree 1 and Tree 2. Our qualitative results indicated a potential issue with the integrity of results from 9 participants in Tree 2, as they skipped each post-task question. Since not a single participant skipped a question in Tree 1, it is unclear if this had an impact on our findings. It is possible that this could be an indicator of less integrity in the results from those participants.
It can be difficult to describe the task without using the words present in the labels. However, care has to be taken to try and do so, in order to avoid misinterpretation of the meaning of the task and leading users.
Non-significant results make it hard to make recommendations
It is essential to make requiring a response to each question compulsory to ensure the integrity of the results