Quantitative Metrics/ Information Architecture
(Team of 5)
(Jan 2022 - May 2022)
The study aimed at improving the organization of information on the Stanford University Website. It attempted to review the website for navigation and usability issues and make recommendations based on the results of the card sorting and tree testing exercises. For the variety of users seeking information, we believe that these organizational improvements will lead to a more usable and valuable experience.
Stanford.edu was chosen for the vast amount of content that can be overwhelming simply by the nature of its size. Navigating the website tends to be challenging owing to the confusing organization of information. It feels disjointed due to the ambiguous options and categories. Oftentimes, users cannot easily find the information they are looking for.
The purpose of this study was to find out how people organize the institutional content on the Stanford University website and to suggest improvements to the information architecture to improve the usability and effectiveness of the website
We reviewed the organization of content and evaluated how the users would place it by conducting a closed card sort. Based on the results of the card sort, we designed a tree test to determine which information structure is perceived as more usable and made recommendations for restructuring the website.
In an undertaking to improve the university’s website, our team conducted a review and analysis of the existing site architecture.
After an initial 5-person open card sort to finalize the 7 categories, the team conducted a closed card sort with 45 cards and a total of 25 participants.
After finding all possible labels and categories, and conducting an open card sort of the existing information on Miro (internal to the team), the team decided to narrow down the categories and cards. Ultimately 45 cards or labels were selected and 7 categories.
The team conducted the card sorting using Optimal Workshop. The 25 participants took part in an unmoderated closed card sort. The participants organized 45 labels into 7 categories. Since the test was closed, the participants were not able to add their own labels.
In addition, the participants answered 3 pre-questionnaire questions and 4 post-questionnaire questions. The goal of these questions was to better understand participants' familiarity with university websites, gather demographics, and determine if the categories selected were sufficient. Two of the questions were open-ended in order to gather qualitative data for analysis.
After the data was collected from the 25 participants, the team used hierarchical cluster analysis and dendrograms generated using R Studio to analyze the quantitative data. The dendrogram generated using the average method was finalized with 7 categories based on a separation height of 3 for the clusters. After examining all 10 ways to build the dendrograms, hierarchical clustering using average linkage was chosen as the optimal dendrogram.
We made sure to look at multiple ways to cut our dendrograms into different groups. Eventually, we landed on cutting the dendrogram around a distance of 3 to divide the cards into 7 groups. This dendrogram was chosen because of its clear distinction of “About” cards focusing more on the information on the university history and facts, whereas “Resources” focused more on external resources or additional information
This aligned with our 7 categories closed card sort that we started with but also gave the best representation. It kept the cards in each group reasonable and grouped at shorter distances. Based on the dendrograms, 7 groups matching our initial category number is the optimal number. This keeps the number of cards reasonable in each category for a lighter cognitive load. It is also the best at keeping the related cards together at the right distance based on our predetermined categories.
From the MDS with no clustering, some distinctions could be seen at a glance. Certain items were very close together, yet far from other items and likely to be grouped together. Items around the top right corner were also close together and likely to be a group of their own. Items in the left side of the graph were a bit hard to distinguish and group. Some of these could be grouped together in different ways with surrounding items. These cards were very close in proximity and were the biggest area of variation that correspond to the dendrograms as well. The items around this area were likely the hardest to categorize for participants.
The MDS with cluster showed patterns that were seen in the non-clustering version. Items that fell into groups like a - light blue (bottom right corner) or 6 - teal (top right corner) are clearly distinct from other groups and naturally grouped together. The much closer items on the left were divided into three cluster groups: group 4 - red, 2 - yellow or 7 - brown. Based on this graph, our analysis from the dendrogram was confirmed. Cards that were grouped together on the left that relate to “About” or “Resources” categories were key areas of divided categorization.
These findings were also confirmed by the Distance Matrix. The distance matrix visually represents the areas of consideration for tree testing.
The cards represented by the lighter blue and within the green callout box represent where participants had trouble grouping cards. Many of these cards were split between different categories. The smaller darker blue box clusters represent where participants were consistent in the card grouping. There is also additional fuzziness in the data with the lighter blue sections that require digging into, with our correlation matrix.
Based on the correlation matrix and the Popular Placement Matrix from the Optimal Workshop, we identified 14 cards that were split between more than one category. These 14 cards fell under 50% in the first category and the second category percentage was even lower. This means that participants were split on under which categories these cards should fall.
The card sorting spreadsheet from Donna Spencer’s workbook was used to obtain the correlation matrix to further support decision-making with respect to categorization.
and the distance matrix
Many of the 14 categories from the Correlation Matrix appear in the large green box from the distance matrix highlighting the area of focus for future testing. Cards such as “A to Z index”, “Campus Map”, “Careers”, “Alumni”, “Public Safety”, and “Libraries” are a few items that fall within this area.
Based on this analysis, there are 4 groups that are clearly grouped together. The categories that these cards commonly are split amongst are “Resources”, “About”, and “Campus Life”. These items were the most significant to take to the next step of tree testing.
The tree test was conducted for areas where the participants were the most split: Resources, About, and Campus Life- to identify where participants are looking when prompted with specific tasks. With this in mind the team selected the following research question as we began the tree sort:
The team used 2 trees for the tree test with the structure of each tree housing the categories for the five items: Campus Map, Careers, Libraries, Public Safety, and News based on the results of the previously conducted card sorting exercise. For each of these five items, a task was created that would make the users look for these items within a tree and the outcomes were recorded.
The tasks were the same for each tree, but the cards within the trees varied in their destination. Additionally, the tasks were randomized and administered from Optimal Workshop. The team also developed an associated null hypothesis (H0) and an alternative hypothesis (H1) per task.
A within-subject design test was first conducted for each tree across the three dependent variables to check participant behavior within each tree followed by a between-subject design test between both the trees, to assess their performance, for each of the five tasks across the three dependent variables. 16 participants completed the tasks using Tree 1, and 16 additional unique participants completed the tasks using Tree 2.
The entire tree sort structure included all of the original 7 categories (Admissions, Undergraduate, Graduate, Research, About, Campus Life, and Resources), but the cards tested were only located in the three categories that were determined to have the most issues from the card sort (Resources, Campus Life, and About).
While taking the test, the participants were prompted with the task and selected where they might find the information. After each individual task, the participant was prompted to answer a qualitative question:
At the end of the test, the questions filled out demographic information.
We used RStudio to run the statistical analysis.
For the within-subjects groups, the team evaluated if the data were normally distributed by using A Shapiro-Wilk’s test. Since the data in both trees were not normally distributed, a Friedman Test was used to compare the variables of lostness, time, and success to the tasks. For the variables that were significantly different, a posthoc analysis (Conover Test) was administered to determine which tasks were significantly different.
The data was then plotted to visually represent the results.
For the between-subjects test, since the data were not normally distributed, the team determined that we needed to run a Mann-Whitney U test. This test was administered for each task on time, success, and lostness. For each, first, the data would be summarized. Then the Mann-Whitney test was run between the dependent variables (time, success, and lostness) and the tree.
The data was then graphed into a boxplot or pie chart. This was done for each of the 5 tasks. The team also examined data from Optimal Workshop.
When there was no significant difference and the data from R was not pointing definitively in one direction or another, the team used first click data, visited during data, and overall score from Optimal Workshop to draw conclusions.
Quantitative data were analyzed along with a look at the qualitative data from post-task and post-study questionnaire responses to build the final inferences.
To summarize, we rejected the null hypothesis for Task 1. For Task 1, our hypothesis was incorrect, as participants were less lost when “Campus Map” is listed under Campus Life. Task 5 did not have significant findings for lostness but did have significant differences in time and success. Findings for Tasks 2, 3, and 4 were not significant, but we made recommendations based on the quantitative and qualitative data gathered. We looked at data across the board including but not limited to time, success rate and clicks in order to provide the best recommendation.
Overall, Tree 1 performed poorly in our test, with low success rates and high reported difficulty.
We recommend moving forward with Tree 2, with significant results in Tasks 1 and 5 that support this recommendation. Campus Map was most successful when placed under Campus Life, and News was most successful when placed under About. We recommend giving some additional thought and testing to Careers (Task 2) and Libraries (Task 3), as there were no significant findings here and our conclusions are speculative, informed by the additional first click data and feedback from our participants. For Public Safety (Task 4), we recommend looking into renaming the category, with a suggestion to try out Campus Safety instead.
With these adjustments, the Stanford website should prove to be more usable for its visitors, with less time wasted, and less frustration around finding the information they need.
One of our biggest challenges during this test was getting the task question wording right. The questions were difficult to formulate in two ways. First, it was difficult to come up with a realistic scenario without using any of the words that appear in our tree, especially the related item. Second, even with an alternative way to ask participants to find the item, the way questions were constructed or the words in them could be easily misinterpreted or bias the actions. We did iterate on the questions a few times but, as shown by the data result, task 3 clearly misled participants due to the wording.
Another challenge is the analysis and interpretation of non-significant results. For task 2-4, the data analysis between trees was not significant. It was difficult to analyze the differences between trees when the difference was not significant. While we were able to reference Optimal Workshop’s additional numbers, ultimately, the results can not be completely conclusive.
We received rich qualitative data from our post-task questions, but there was a significant difference between Tree 1 and Tree 2. Our qualitative results indicated a potential issue with the integrity of results from 9 participants in Tree 2, as they skipped each post-task question. Since not a single participant skipped a question in Tree 1, it is unclear if this had an impact on our findings. It is possible that this could be an indicator of less integrity in the results from those participants.
It can be difficult to describe the task without using the words present in the labels. However, care has to be taken to try and do so, in order to avoid misinterpretation of the meaning of the task and leading users.
Non-significant results make it hard to make recommendations
It is essential to make requiring a response to each question compulsory to ensure the integrity of the results