Proceedings

CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval

CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval

Full Citation in the ACM Digital Library

SESSION: Session 1: Tasks

Taking Search to Task

  • Chirag Shah
  • Ryen White
  • Paul Thomas
  • Bhaskar Mitra
  • Shawon Sarkar
  • Nicholas Belkin

The importance of tasks in information retrieval (IR) has been long argued for, addressed in different ways, often ignored, and frequently revisited. For decades, scholars made a case for the role that a user’s task plays in how and why that user engages in search and what a search system should do to assist. But for the most part, the IR community has been too focused on query processing and assuming a search task to be a collection of user queries, often ignoring if or how such an assumption addresses the users accomplishing their tasks. With emerging areas of conversational agents and proactive IR, understanding and addressing users’ tasks has become more important than ever before. In this paper, we provide various perspectives on where the state-of-the-art is with regard to tasks in IR, what are some of the bottlenecks in deriving and using task information, and how do we go forward from here. In addition to covering relevant literature, the paper provides a synthesis of historical and current perspectives on understanding, extracting, and addressing task-focused search. To ground ongoing and future research in this area, we present a new framing device for tasks using a tree-like structure and various moves on that structure that allow different interpretations and applications. Presented as a combination of synthesis of ideas and past works, proposals for future research, and our perspectives on technical, social, and ethical considerations, this paper is meant to help revitalize the interest and future work in task-based IR.

Understanding Recruiters’ Information Seeking Behavior in Talent Search

  • Mesut Kaya
  • Toine Bogers

While the rise of online job portals and corporate websites have allowed for easier collection of digital candidate CVs, much of the candidate identification and assessment process—also known as talent search—still requires manual work from recruiters. Recruitment is a professional search domain that has been largely overlooked in IR research, even though better support of recruiters in finding more high-quality candidates could have a big impact on job seekers, companies and society as a whole. Such recruiter support can only be built on top of a more thorough understanding of the information seeking behavior of recruiters when trying to identify the most relevant candidates for open job postings.

In this paper, we present the results of a log-based study of the information seeking process of recruiters at one of Scandinavia’s largest job portals and recruitment agencies. We analyze the behavior of recruiters at different search stages according to the model by Vakkari [24] and distinguish between different types of recruitment tasks. In addition, we contextualize the results of our log analysis using the findings from earlier conducted contextual inquiries to help explain our findings. Our results show that both matching and recruiting talent search is a complex task: recruiters usually submit multiple queries during sessions that can last for hours. We also find that the search behaviour of recruiters during a recruitment task changes over time: recruiters tend to use more filters, formulate longer and more diverse queries, and spend more time assessing candidates near the end of a session than in the beginning. We also observe some differences in search behavior between the different recruitment tasks.

Understanding Procedural Search Tasks “in the Wild”

  • Bogeum Choi
  • Jaime Arguello
  • Robert Capra

People often search online for procedural (i.e., “how-to”) knowledge. A procedural search task might involve a do-it-yourself project, cooking a dish, fixing a problem, or learning a new skill. Prior research has studied procedural search tasks from different perspectives: estimating the frequency of procedural searches online, understanding how people acquire procedural knowledge in specific contexts, and developing tools to support procedural search. Less research has aimed at deeply understanding procedural search tasks “in the wild”. To bridge this gap, we conducted a survey (N = 128) on Amazon Mechanical Turk. Participants were asked to recall a recent procedural task for which they searched online. Participants were asked open-ended questions about the task itself and their unique situation (e.g., constraints and needs). Additionally, participants provided webpages they found useful in their searches and described the characteristics of the page that made it useful. Finally, they provided useful pieces of information from each selected page and explained what they gained from the information. Using an inductive coding approach, we analyzed participants’ responses to gain insights about: (1) procedural task characteristics, (2) goals, (3) constraints, (4) contextual factors, (5) relevance criteria, and (6) gains obtained from useful information. Based on our results, we discuss important implications for future research and system design.

SESSION: Session 2: Design

Guiding Oral Conversations: How to Nudge Users Towards Asking Questions?

  • Marcel Gohsen
  • Johannes Kiesel
  • Mariam Korashi
  • Jan Ehlers
  • Benno Stein

How could an envisioned voice-based conversational information system assist the information seeker when the seeker does not know how to continue the conversation? The system could explicitly suggest a question to ask after each of its responses, but this approach quickly feels restrictive, repetitive, and interrupts immersion in the conversation. In this paper, we explore, for the first time, unobtrusive syntactic and auditive modifications of oral system responses to nudge information seekers towards asking about specific topics. We report the results of a crowdsourcing study with 965 participations that investigated the effectiveness and drawbacks of different modifications in three information scenarios.

Eyes on Immersive Search: Eye-Tracking Study of Search Engine Result Pages in Immersive Virtual Environments

  • Austin Ward
  • Bogeum Choi
  • Robert Capra

User interactions with search engine result pages (SERPs) are well researched in desktop and mobile computing environments. However, relatively little work has focused on how users interact with SERPs presented in 3D immersive virtual environments (IVEs) using virtual reality head-mounted displays (VR HMDs). While 2D displays have well-understood methods to present search results (e.g., ranked lists of 10 blue links), 3D IVEs do not yet have established paradigms for presenting search results.

In this paper, we present results from a within-subjects user study to investigate users’ interactions, eye-tracking behaviors, and preferences for four different display arrangements of search results (vertical list, 3x3 grid, 4x4 grid, 4x4 sphere) in a VR HMD across two different task types (find all relevant, pick 3 best). 32 participants completed 5 search trials in 8 experimental conditions (4 displays x 2 task types). Our results show that: (1) participants had a positional bias for the top or top left of SERPs, (2) they perceived the list display as requiring more effort, (3) they perceived a result ordering in the list display but not in the other displays, and (4) they showed a wider variety of navigational patterns in the 4x4 displays that did not require scrolling. We describe implications of the results and insights for presenting search results to users in HMD environments.

The Evolution of Web Search User Interfaces - An Archaeological Analysis of Google Search Engine Result Pages

  • Bruno Oliveira
  • Carla Teixeira Lopes

Web search engines have marked everyone’s life by transforming how one searches and accesses information. Search engines give special attention to the user interface, especially search engine result pages (SERP). The well-known “10 blue links” list has evolved into richer interfaces, often personalized to the search query, the user, and other aspects. More than 20 years later, the literature has not adequately portrayed this development. We present a study on the evolution of SERP interfaces during the last two decades using Google Search as a case study. We used the most searched queries by year to extract a sample of SERP from the Internet Archive. Using this dataset, we analyzed how SERP evolved in content, layout, design (e.g., color scheme, text styling, graphics), navigation, and file size. We have also analyzed the user interface design patterns associated with SERP elements. We found that SERP are becoming more diverse in terms of elements, aggregating content from different verticals and including more features that provide direct answers. This systematic analysis portrays evolution trends in search engine user interfaces and, more generally, web design. We expect this work will trigger other, more specific studies that can take advantage of our dataset.

SESSION: Session 3: Obstacles

Rising of Retracted Research Works and Challenges in Information Systems: Need New Features for Information Retrieval and Interactions

  • Peiling Wang

This perspective paper analyzes the rising threat of retracted scientific works and the challenges of preventing the continued spreading and use of the retracted science; further, a framework is proposed for research and actions to effectively manage retractions in the information ecosystem. The precipitous increase in retractions of scientific publications is real and the complexity of retracting publications challenges current IR systems and people's information behaviors. Retracting published, especially peer-reviewed, papers in prestigious venues is a complex phenomenon involving various entities through often time-consuming processes. These publications may be accessible from the original venues, digital archives, or free-access databases, but these systems differ in retrievability and output. Many systems do not identify the retractions or reasons for retractions; most systems do not treat the retracted paper and its related notices (retraction or correction) as an integrated entity. Studies found that many retracted publications continue to be cited post-retraction as valid science. A new threat is the widely spread of retracted publications on social media. Retracting invalid scientific publications has serious implications in the real world. Based on current findings, we propose (1) a framework for further research; (2) a DOI resolution to integrate the documents related to retraction/correction; (3) a structured facet taxonomy for representing and indexing the retracted, corrected, or republished publications in databases; (4) a retraction registry or database with personalized AI helper for researchers to tract retracted publications; (5) an approach for understanding how retracted publications are circulated on social media.

Driven to Distraction: Examining the Influence of Distractors on Search Behaviours, Performance and Experience

  • Leif Azzopardi
  • David Maxwell
  • Martin Halvey
  • Claudia Hauff

Advertisements, sponsored links, clickbait, in-house recommendations and similar elements pervasively shroud featured content. Such elements vie for people’s attention, potentially distracting people from their task at hand. The effects of such “distractors” is likely to increase people’s cognitive workload and reduce their performance as they need to work harder to discern the relevant from non-relevant. In this paper, we investigate how people of varying cognitive abilities (measured using Perceptual Speed and Cognitive Failure instruments) are affected by these different types of distractions when completing search tasks. We performed a crowdsourced within-subjects user study, where 102 participants completed four search tasks using our news search engine over four different interface conditions: (i) one with no additional distractors; (ii) one with advertisements; (iii) one with sponsored links; and (iv) one with in-house recommendations. Our results highlight a number of important trends and findings. Participants perceived the interface condition without distractors as significantly better across numerous dimensions. Participants reported higher satisfaction, lower workload, higher topic recall, and found it easier to concentrate. Behaviourally, participants issued queries faster and clicked results earlier when compared to the interfaces with distractors. When using the interfaces with distractors, one in ten participants clicked on a distractor—and despite engaging with a distractor for less than twenty seconds, their task time increased by approximately two minutes. We found that the effects were magnified depending on cognitive abilities—with a greater impact of distractors on participants with lower perceptual speed, and for those with a higher propensity of cognitive failures. Distractors—regardless of their type—have negative consequences on a user’s search experience and performance. As a consequence, interfaces containing visually distracting elements are creating poorer search experiences due to the “distractor tax” being placed on people’s limited attention.

Why People Skip Music? On Predicting Music Skips using Deep Reinforcement Learning

  • Francesco Meggetto
  • Crawford Revie
  • John Levine
  • Yashar Moshfeghi

Music recommender systems are an integral part of our daily life. Recent research has seen a significant effort around black-box recommender based approaches such as Deep Reinforcement Learning (DRL). These advances have led, together with the increasing concerns around users’ data collection and privacy, to a strong interest in building responsible recommender systems. A key element of a successful music recommender system is modelling how users interact with streamed content. By first understanding these interactions, insights can be drawn to enable the construction of more transparent and responsible systems. An example of these interactions is skipping behaviour, a signal that can measure users’ satisfaction, dissatisfaction, or lack of interest. In this paper, we study the utility of users’ historical data for the task of sequentially predicting users’ skipping behaviour. To this end, we adapt DRL for this classification task, followed by a post-hoc explainability (SHAP) and ablation analysis of the input state representation. Experimental results from a real-world music streaming dataset (Spotify) demonstrate the effectiveness of our approach in this task by outperforming state-of-the-art models. A comprehensive analysis of our approach and of users’ historical data reveals a temporal data leakage problem in the dataset. Our findings indicate that, overall, users’ behaviour features are the most discriminative in how our proposed DRL model predicts music skips. Content and contextual features have a lesser effect. This suggests that a limited amount of user data should be collected and leveraged to predict skipping behaviour.

SESSION: Session 4: Disorientation

True or false? Cognitive load when reading COVID-19 news headlines: an eye-tracking study

  • Li Shi
  • Nilavra Bhattacharya
  • Anubrata Das
  • Jacek Gwizdka

Misinformation is an important topic in the Information Retrieval (IR) context and has implications for both system-centered and user-centered IR. While it has been established that the performance in discerning misinformation is affected by a person’s cognitive load, the variation in cognitive load in judging the veracity of news is less understood. To understand the variation in cognitive load imposed by reading news headlines related to COVID-19 claims, within the context of a fact-checking system, we conducted a within-subject, lab-based, quasi-experiment (N=40) with eye-tracking. Our results suggest that examining true claims imposed a higher cognitive load on participants when news headlines provided incorrect evidence for a claim and were inconsistent with the person’s prior beliefs. In contrast, checking false claims imposed a higher cognitive load when the news headlines provided correct evidence for a claim and were consistent with the participants’ prior beliefs. However, changing beliefs after examining a claim did not have a significant relationship with cognitive load while reading the news headlines. The results illustrate that reading news headlines related to true and false claims in the fact-checking context impose different levels of cognitive load. Our findings suggest that user engagement with tools for discerning misinformation needs to account for the possible variation in the mental effort involved in different information contexts.

It is an online platform and not the real world, I don’t care much: Investigating Twitter Profile Credibility With an Online Machine Learning-Based Tool

  • Junhao Li
  • Ville Paananen
  • Sharadhi Alape Suryanarayana
  • Eetu Huusko
  • Miikka Kuutila
  • Mika MĂ€ntylĂ€
  • Simo Hosio

Social media is now an important source of everyday information. Given the plethora of scandals concerning the rapid spread of misinformation and disinformation on social media, the credibility of the content on these platforms is now a pivotal research area. Much of the existing work on social media credibility focuses on content credibility. In this study, however, we focus on the credibility of the profile as the virtual representation of the content author. We developed a real-time machine-learning-based online tool that assesses the credibility of profiles on Twitter, one of the most common and versatile social media platforms. To investigate user perceptions on credibility-related issues, we used our tool as a stimulus for people to reflect on their profile’s credibility and collected 100 responses. The combination of our quantitative and qualitative analysis reveals that the latest tweets and retweet behavior are two of the most critical factors for profile credibility. It is also observed that people demonstrate a limited interest in their profile credibility but agree that the author’s credibility is of paramount importance. With an open-source tool to assess user credibility on Twitter and a user study to establish its utility, we contribute a timely piece of research on the topic of online credibility.

SESSION: Session 5: Academic Work

Direct, Orienting, and Scenic Paths: How Users Navigate Search in a Research Data Archive

  • Sara Lafia
  • A.J. Million
  • Libby Hemphill

Social scientists increasingly share data so others can evaluate, replicate, and extend their research. To understand the process of data discovery as a precursor to data use, we study prospective users’ interactions with archived data. We gathered data for 98,000 user sessions initiated at a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). Our data reflect four years (2012-16) of users’ interactions with archival resources, including a data catalog, study-level metadata, variables, and publications that cite nearly 10,000 datasets. We constructed a network of user interactions linking website landing (e.g., site entrances) to exit pages, from which we identified three types of paths that users take through the research data archive: direct, orienting, and scenic. We also interpreted points of failure (e.g., drop-offs) and recurring behaviors (e.g., sensemaking) that support or impede data discovery along search paths. We articulate strategies that users adopt as they navigate data search and suggest ways to enhance the accessibility of data, metadata, and the systems that organize each.

How Data Scientists Review the Scholarly Literature

  • Sheshera Mysore
  • Mahmood Jasim
  • Haoru Song
  • Sarah Akbar
  • Andre Kenneth Chase Randall
  • Narges Mahyar

Keeping up with the research literature plays an important role in the workflow of scientists – allowing them to understand a field, formulate the problems they focus on, and develop the solutions that they contribute, which in turn shape the nature of the discipline. In this paper, we examine the literature review practices of data scientists. Data science represents a field seeing an exponential rise in papers, and increasingly drawing on and being applied in numerous diverse disciplines. Recent efforts have seen the development of several tools intended to help data scientists cope with a deluge of research and coordinated efforts to develop AI tools intended to uncover the research frontier. Despite these trends indicative of the information overload faced by data scientists, no prior work has examined the specific practices and challenges faced by these scientists in an interdisciplinary field with evolving scholarly norms. In this paper, we close this gap through a set of semi-structured interviews and think-aloud protocols of industry and academic data scientists (N = 20). Our results while corroborating other knowledge workers’ practices uncover several novel findings: individuals (1) are challenged in seeking and sensemaking of papers beyond their disciplinary bubbles, (2) struggle to understand papers in the face of missing details and mathematical content, (3) grapple with the deluge by leveraging the knowledge context in code, blogs, and talks, and (4) lean on their peers online and in-person. Furthermore, we outline future directions likely to help data scientists cope with the burgeoning research literature.

Incubation and Verification Processes in Information Seeking: A Case Study in the Context of Autonomous Learning

  • Yujia Li
  • Chang Liu
  • Preben Hansen

Autonomous learning regards students as the center and orientation rather than teachers’ guidance. During autonomous learning, information seeking is not only a process of interactions with systems and stakeholders, but also a process of learning and cognitive transformation from low-level to high-level activities. This study investigated users’ cognitive process and cognitive paths, as well as the creation strategies for independent topic selection during the information seeking process. We conducted a longitudinal study through tracking interviews with eight university students who planned to select a topic for their theses or independent study. The interviews were conducted weekly to collect data of their cognitive process and seeking behaviors. It is found that: (1) four lower levels of cognitive process (understanding, applying, analyzing and evaluating) often occur during the stages before formulation, while creating occurs in the formulation stage; (2) three cognitive paths for topic selection were identified: "understand - apply - create", "understand - analyze - create" and "understand - analyze - evaluate - create", and (3) two creation strategies for topic selection according to the duration of creation stages were identified: Incubation and Verification. These results shed light on the design of search systems that could better assist the autonomous learning process and for users to accomplish creative learning tasks.

SESSION: Session 6: Inspiration

SearchIdea: An Idea Generation Tool to Support Creativity in Academic Search

  • Catherine Chavula
  • Yujin Choi
  • Soo Young Rieh

Users searching for information in academic contexts often need to compare different perspectives, organize search results, and synthesize topics. To support people’s creative thinking processes while searching for academic information, we developed SearchIdea, a Web-based online tool, that enables users to actively interact with search results beyond evaluation and selection. Through its three primary features–search-results, SearchMapper, and IdeaMapper–SearchIdea allows users to add saved search results to SearchMapper for comparison, prioritization, and rearrangement. Using IdeaMapper, users can elicit keywords from search results, brainstorm, and organize ideas while identifying relationships among ideas. We also developed a baseline tool, IdeaPad, which provides users with a simple pad for writing and editing text. We then conducted an evaluation study with 58 students at a university in the United States. The study subjects were assigned to either SearchIdea or IdeaPad and performed two search tasks: (1) generating as many ideas as possible, and (2) selecting the best idea after generating multiple ideas. The results showed that subjects using SearchIdea entered more unique search terms, generated longer queries, and engaged with search results more actively than those who used IdeaPad. The SearchIdea users reported higher ratings for idea generation in terms of synthesizing and organizing ideas than did the IdeaPad users. The findings of our study provide insights into how an idea generation tool can connect search activities with creative thinking processes in order to generate more and better ideas.

The Infinite Index: Information Retrieval on Generative Text-To-Image Models

  • Niklas Deckers
  • Maik Fröbe
  • Johannes Kiesel
  • Gianluca Pandolfo
  • Christopher Schröder
  • Benno Stein
  • Martin Potthast

Conditional generative models such as DALL-E and Stable Diffusion generate images based on a user-defined text, the prompt. Finding and refining prompts that produce a desired image has become the art of prompt engineering. Generative models do not provide a built-in retrieval model for a user’s information need expressed through prompts. In light of an extensive literature review, we reframe prompt engineering for generative models as interactive text-based retrieval on a novel kind of “infinite index”. We apply these insights for the first time in a case study on image generation for game design with an expert. Finally, we envision how active learning may help to guide the retrieval of generated images.

One of Us: a Multiplayer Web-based Game for Digital Evidence Acquisition of Scripts through Crowdsourcing

  • Varvara Kalokyri
  • Alex Borgida
  • Amelie Marian

Digital devices are an integral part of our lives. Through these devices, people produce and save personal data, with or without their explicit awareness. This personal digital information has been exploited by companies, but users find it hard to access and search in a uniform way, due to the heterogeneity, fragmentation of data and non-uniform access interface. By integrating and organizing this information into common kinds of everyday episodes ("scripts") that people engage in, we can help users recall and explore forgotten details of their past. However, being able to recognize such episodes in the user’s personal digital information requires not only script knowledge (e.g., the steps/actions in the script), but also explicit knowledge about the digital traces potentially left behind by each of the actions. In this paper, we present "One Of Us", a web-based multiplayer game, which collects descriptions of different kinds of personal digital traces, by having players identify the digital traces that might be produced by each of the actions in a given script. We report on the results of an experimental study, which gives evidence that our game is i) enjoyable, ii) accounts for uncommon answers, iii) validates and assesses knowledge by having the players vote on other’s responses - thus not requiring a second round of quality assessment, and iv) dynamically acquires new pieces of information.

SESSION: Session 7: Influence

Consumer Health Information Quality, Credibility, and Trust: An Analysis of Definitions, Measures, and Conceptual Dimensions

  • Jiaying Liu
  • Yan Zhang
  • Yeolib Kim

Accessing quality information is becoming increasingly important, given the amount of information on the Internet and the exponential growth of misinformation. Information retrieval research tends to focus on conceptualizing and measuring the concept of relevance and synthesizing research on how users perceive and judge the relevance of information. Comparatively, less attention has been paid to information quality and the synthesis of research on how quality is perceived and judged by information consumers. As an initial effort to bridge the gap, we reviewed the literature concerning information quality on the topic of consumer health information seeking. We included literature on three intertwined concepts – credibility, trust, and quality – which more or less convey the notion of the quality of information. We collected the definitions and measures of the three concepts and identified their overlaps and differences. We further classified the dimensions of these concepts based on an existing hierarchical taxonomy of data quality. We found that the three concepts shared a core set of dimensions: credibility, trustworthiness, objectivity, accuracy, reliability, currency, and recommendations to friends. However, they had different scopes and differed in notable ways. Credibility and trust emphasize the intrinsic features of information, source, or system followed by their relations to users’ feelings, whereas quality emphasizes information or system’s fitness with tasks at hand followed by the intrinsic features of information or system. The results call for more explicit definitions of these concepts in empirical research and greater efforts to theorize users’ perceptions of information quality in increasingly complex and opaque information systems. Such work precedes the design of effective, user-centered, and ethical information systems.

Investigating the Influence of Featured Snippets on User Attitudes

  • Markus Bink
  • Sebastian Schwarz
  • Tim Draws
  • David Elsweiler

Featured snippets that attempt to satisfy users’ information needs directly on top of the first search engine results page (SERP) have been shown to strongly impact users’ post-search attitudes and beliefs. In the context of debated but scientifically answerable topics, recent research has demonstrated that users tend to trust featured snippets to such an extent that they may reverse their original beliefs based on what such a snippet suggests; even when erroneous information is featured. This paper examines the effect of featured snippets in more nuanced and complicated search scenarios concerning debated topics that have no ground truth and where diverse arguments in favor and against can legitimately be made. We report on a preregistered, online user study (N = 182) investigating how the stances and logics of evaluation (i.e., underlying reasons behind stances) expressed in featured snippets influence post-task attitudes and explanations of users without strong pre-search attitudes. We found that such users tend to not only change their attitudes on debated topics (e.g., school uniforms) following whatever stance a featured snippet expresses but also incorporate the featured snippet’s logic of evaluation into their argumentation. Our findings imply that the content displayed in featured snippets may have large-scale undesired consequences for individuals, businesses, and society, and urgently call for researchers and practitioners to examine this issue further.

Explainable Cross-Topic Stance Detection for Search Results

  • Tim Draws
  • Karthikeyan Natesan Ramamurthy
  • Ioana Baldini
  • Amit Dhurandhar
  • Inkit Padhi
  • Benjamin Timmermans
  • Nava Tintarev

One way to help users navigate debated topics online is to apply stance detection in web search. Automatically identifying whether search results are against, neutral, or in favor could facilitate diversification efforts and support interventions that aim to mitigate cognitive biases. To be truly useful in this context, however, stance detection models not only need to make accurate (cross-topic) predictions but also be sufficiently explainable to users when applied to search results – an issue that is currently unclear. This paper presents a study into the feasibility of using current stance detection approaches to assist users in their web search on debated topics. We train and evaluate 10 stance detection models using a stance-annotated data set of 1204 search results. In a preregistered user study (N = 291), we then investigate the quality of stance detection explanations created using different explainability methods and explanation visualization techniques. The models we implement predict stances of search results across topics with satisfying quality (i.e., similar to the state-of-the-art for other data types). However, our results reveal stark differences in explanation quality (i.e., as measured by users’ ability to simulate model predictions and their attitudes towards the explanations) between different models and explainability methods. A qualitative analysis of textual user feedback further reveals potential application areas, user concerns, and improvement suggestions for such explanations. Our findings have important implications for the development of user-centered solutions surrounding web search on debated topics.

SESSION: Session 8: Interpretation

Toward A Two-Sided Fairness Framework in Search and Recommendation

  • Jiqun Liu

As artificial intelligence (AI) assisted search and recommender systems have become ubiquitous in workplaces and everyday lives, understanding and accounting for fairness has gained increasing attention in the design and evaluation of such systems. While there is a growing body of computing research on measuring system fairness and biases associated with data and algorithms, the impact of human biases that go beyond traditional machine learning (ML) pipelines still remain understudied. In this Perspective Paper, we seek to develop a two-sided fairness framework that not only characterizes data and algorithmic biases, but also highlights the cognitive and perceptual biases that may exacerbate system biases and lead to unfair decisions. Within the framework, we also analyze the interactions between human and system biases in search and recommendation episodes. Built upon the two-sided framework, our research synthesizes intervention and intelligent nudging strategies applied in cognitive and algorithmic debiasing, and also proposes novel goals and measures for evaluating the performance of systems in addressing and proactively mitigating the risks associated with biases in data, algorithms, and bounded rationality. This paper uniquely integrates the insights regarding human biases and system biases into a cohesive framework and extends the concept of fairness from human-centered perspective. The extended fairness framework better reflects the challenges and opportunities in users’ interactions with search and recommender systems of varying modalities. Adopting the two-sided approach in information system design has the potential to enhancing both the effectiveness in online debiasing and the usefulness to boundedly rational users engaging in information-intensive decision-making.

Understanding the Cognitive Influences of Interpretability Features on How Users Scrutinize Machine-Predicted Categories

  • Jiaming Qu
  • Jaime Arguello
  • Yue Wang

The goal of interpretable machine learning (ML) is to design tools and visualizations to help users scrutinize a system’s predictions. Prior studies have mostly employed quantitative methods to investigate the effects of specific tools/visualizations on outcomes related to objective performance—a human’s ability to correctly agree or disagree with the system—and subjective perceptions of the system. Few studies have employed qualitative methods to investigate how and why specific tools/visualizations influence performance, perceptions, and behaviors. We report on a lab study (N = 30) that investigated the influences of two interpretability features: confidence values and sentence highlighting. Participants judged whether medical articles belong to a predicted medical topic and were exposed to two interface conditions—one with and one without interpretability features. We investigate the effects of our interpretability features on participants’ performance and perceptions. Additionally, we report on a qualitative analysis of participants’ responses during an exit interview. Specifically, we report on how our interpretability features impacted different cognitive activities that participants engaged with during the task—reading, learning, and decision making. We also describe ways in which the interpretability features introduced challenges and sometimes led participants to make mistakes. Insights gained from our results point to future directions for interpretable ML research.

Weakly Supervised Turn-level Engagingness Evaluator for Dialogues

  • Shaojie Jiang
  • Svitlana Vakulenko
  • Maarten de Rijke

Engagingness is an important measurement for evaluating open-domain conversational systems. The standard approach to evaluating dialogue engagingness is by measuring conversation turns per session (CTPS), which implies that the dialogue length is the main predictor of the user engagement with a dialogue system. The main limitation of CTPS is that it can only be measured at the session level, i.e., once the dialogue is over. But a dialogue system has to continuously monitor user engagement throughout the dialogue session as well. Existing approaches to measuring turn-level engagingness require human annotations for training. We pioneer an alternative approach, Weakly Supervised Engagingness Evaluator (WeSEE), which uses the remaining depth for each turn as a heuristic weak label for engagingness. Weakly Supervised Engagingness Evaluator (WeSEE) does not require human annotations and also relates closely to CTPS, thus serving as a good learning proxy for this metric. We show that WeSEE achieves the new state-of-the-art results on the Fine-grained Evaluation of Dialog dataset (0.38 Spearman correlation coefficient) and the DailyDialog dataset (0.62 Spearman correlation coefficient).

SESSION: Session 9: Elucidation

Much Ado About Gender: Current Practices and Future Recommendations for Appropriate Gender-Aware Information Access

  • Christine Pinney
  • Amifa Raj
  • Alex Hanna
  • Michael D. Ekstrand

Information access research (and development) sometimes makes use of gender, whether to report on the demographics of participants in a user study, as inputs to personalized results or recommendations, or to make systems gender-fair, amongst other purposes. This work makes a variety of assumptions about gender, however, that are not necessarily aligned with current understandings of what gender is, how it should be encoded, and how a gender variable should be ethically used. In this work, we present a systematic review of papers on information retrieval and recommender systems that mention gender in order to document how gender is currently being used in this field. We find that most papers mentioning gender do not use an explicit gender variable, but most of those that do either focus on contextualizing results of model performance, personalizing a system based on assumptions of user gender, or auditing a model’s behavior for fairness or other privacy-related issues. Moreover, most of the papers we review rely on a binary notion of gender, even if they acknowledge that gender cannot be split into two categories. We connect these findings with scholarship on gender theory and recent work on gender in human-computer interaction and natural language processing. We conclude by making recommendations for ethical and well-grounded use of gender in building and researching information access systems.

Building a Better Mousetrap: Tools and Processes for Selling A Company

  • Chelsea Kerr
  • Alexandra Vtyurina
  • Adam Roegiest

It is a fact of life for many start-ups that they must sell part of their company (i.e., fund raising) in order to have enough capital to grow the company to one day successfully exit the market. The unfortunate side effect of this necessity is that it places a large burden on start-ups to respond to information requests from potential buyers which then forces employees to step away from their day jobs to formulate responses. While it has been the norm to respond to such requests using manual review of contracts and other information sources, the increasingly competitive funding market has resulted in growing time pressure for all participants of start-up purchasing endeavours. Furthermore, current technological offerings often fall short of providing optimal support to the start-up and the buyer which continues to reinforce a process that is often cumbersome and chaotic.

In this work, we present an analysis of 19 interviews of regular participants on both sides of the sales process finding that the main pain points revolve around document management, request tracking, internal and external collaboration. Based upon this analysis, we describe an early-stage prototype to investigate a possible solution for efficient handling of buyers’ information requests. We recruited 12 participants to test this prototype and found that the main issue was misalignment between language in the tool and participants’ mental models. From these two sets of analyses, we present potential implications and considerations for building tools for infrequent but high-risk and high-reward information tasks.

An Instrument for measuring users’ meta-intents

  • Yuan Ma
  • Tim Donkers
  • Timm Kleemann
  • JĂŒrgen Ziegler

We propose the concept of meta-intents which represent high-level user preferences related to the interaction and decision-making in conversational recommender systems (CRS) and present a questionnaire instrument for measuring meta-intents. We conducted a two-stage user study, an exploratory study with 212 participants on Prolific, and a confirmatory study with 394 participants on Prolific. We obtained a reliable and stable meta-intents questionnaire with 22 question items, corresponding to seven latent factors (concepts). These seven factors cover important interaction preferences and are closely related to users’ decision-making process. For example, the factor dialog-initiative reflects whether users prefer to follow the system’s guidance or ask their own questions in a CRS. We conducted statistical analyses of meta-intents in two domains (smartphones and hotels), and a general chatbot scenario. We also investigated the influence of additional factors (demography, decision-making style) on meta-intents through Structural Equation Modeling (SEM). Our results provide preliminary evidence that the proposed meta-intents are domain and demography (gender, age) independent. They can be linked to the general decision-making style and can thus be instrumental in translating general decision-making factors into more concrete design guidance for CRS and their potential personalization. Meta-intents also provide a basis for future analyses of interaction behavior in CRS and the development of a cognitively founded theoretical framework.

SESSION: Short Papers

Assessing Google Search’s New Features in Supporting Credibility judgments of Unknown Websites

  • Ace Wang
  • Liz Maylin De Jesus Sanchez
  • Anya Wintner
  • Yuanxin Zhu
  • Eni Mustafaraj

This study assesses the awareness and perceived utility of two features Google Search introduced in February 2021: “About this result” and “More about this page”. Google stated that the goal of these features is to help users vet unfamiliar web domains (or sources). We investigated whether the features were sufficiently prominent to be detected by frequent users of Google Search, and their perceived utility for making credibility judgments of sources, in one-on-one user studies with 25 undergraduate college students, who identify as frequent users of Google Search. Our results indicate a lack of adoption or awareness of these features by our participants and neutral-positive perceptions of their utility in evaluating web sources. We also examined the perceived usefulness of nine other domain credibility signals collected from the W3C.

Beyond Accurate Answers: Evaluating Open-Domain Question Answering in Enterprise Search

  • Daniel Xiaodan Zhou
  • Lan Liu
  • Anmol Anubhai
  • Maansi Shandilya
  • Steph Sigalas
  • William Yang Wang
  • Zhiheng Huang

Open-domain question answering (OpenQA) research has grown rapidly in recent years. However, OpenQA usability evaluation in its real world applications is largely left under studied. In this paper, we evaluated the actual user experience of OpenQA model deployed in a large tech company’s production enterprise search portal. From qualitative query log analysis and user interviews, our preliminary findings are: 1) There exists a large number of “contingency answers” that cannot be simply evaluated against their face textual values, due to noisy source passages and ambiguous query intents from short keywords queries. 2) Contingency answers contribute to positive search experience for providing “information scents”. 3) Click-through-rate (CTR) is a good user-behavior metric to measure OpenQA result quality, despite the rare existence of “good abandonment”. This exploratory study reveals an often neglected gap between existing OpenQA research and its search engine applications that disconnects the offline research effort with online user experience. We call for reformulating OpenQA model objective beyond answer face value and developing new dataset and metrics for better evaluation protocols.

Collaboration Patterns and Impact of Sharing at CHIIR

  • Toine Bogers
  • Birger Larsen
  • Marijn Koolen
  • Maria GĂ€de
  • Mark M. Hall
  • Vivien Petras

We studied the collaboration patterns of CHIIR authors, and found that most papers are collaborative. A core of 33% of the CHIIR researchers are directly connected and frequently co-author, and several disconnected clusters also make frequent CHIIR contributions. We also studied citation impact of the CHIIR papers and show that in relation to research design type, theoretical and empirical papers tend to receive more citations than resource papers. With regards to sharing and re-use, papers that share at least one resource tend to have significantly higher citation impact—in particular when sharing data resources and design resources. Re-using resources does not significantly increase citation impact in itself.

Comparing Interface Layouts for the Presentation of Multimodal Search Results

  • Wolfgang Gritz
  • Christian Otto
  • Anett Hoppe
  • Georg Pardi
  • Yvonne Kammerer
  • Ralph Ewerth

Today’s search engines allow users to discover relevant information in different types of modalities or media, e.g., web pages, text documents, images, or videos. It is, however, a challenging task to present mixed-modality result lists in an effective and easy-to-skim form. The two most commonly used approaches are to present the modalities side-by-side, each in a separate column of the result page; or to separate the modalities into multiple tabs. However, the field lacks a structured investigation on how the column or tab layout influence the users’ perception and usage of multimodal resources in an academic search task. In this paper, we present a user study (N=50) where the participants were asked to accomplish a search task for a fictive computer science seminar at the university. We evaluate the influence of the different layouts on (1) user search behavior (e.g., time until first resource is saved) and (2) the relevance of the selected resources for the task at hand. Finally, we discuss the results and possible implications for the design of multimodal search result presentation.

Designing Supportive Conversational Agents With and For Teens

  • Irene Lopatovska
  • Jessika Davis

The study involved adolescents (teens) in exploring the potential of conversational agents (CAs) to support adolescents during emotionally difficult times. During the online focus groups, participants were asked to talk about common problems they are experiencing and the conversational support they usually receive, design multi-turn supportive conversations that could be programmed into a CA, and offer recommendations for the features of supportive CAs for teens. Participants talked about stressful experiences that were reported in previous studies, i.e. school and family related problems, peer relationships, and self-image, suggesting that CAs can be programmed to anticipate and address stress/negative emotions related to these common stressors. In expressing preferences for conversational support, participants highlighted the importance of both cognitive (informational) and emotional dimensions in agents’ responses. The findings point to the importance of developing CA content that offers teens advice on strategies to cope with negative emotion (advice on behavior change, self-care strategies), and also simply acknowledges their problems and lands an emotional “hearing ear”.

Exploring older people's challenges on online banking/finance systems: Early findings

  • Dain Thomas
  • Gobinda Chowdhury
  • Ian Ruthven

Use of technology is a prerequisite to conduct a normal life and people recurrently use online financial services (i.e., any website which involve monetary transactions, for example, Internet banking, online shopping, transport service websites etc.) in everyday contexts. Information interactions are affected by the nature of specific digital services as well as a number of behavioural characteristics of users. Older people aged 65 and over are often digitally excluded from the use of online financial services. We identified some reasons for non-use of financial services through semi-structured interviews with older people and employees who assist older people with digital services. Preliminary findings demonstrated that fear of financial scams, lack of digital skills and lack of help are some of the main factors which inhibit them from using digital financial services efficiently.

From 10 Blue Links Pages to Feature-Full Search Engine Results Pages - Analysis of the Temporal Evolution of SERP Features

  • Bruno Oliveira
  • Carla Teixeira Lopes

Web Search Engine Results Pages (SERP) are one of the most well-known and used web pages. These pages have started as simple “10 blue links” pages, but the information in SERP currently goes way beyond these links. Several features have been included in these pages to complement organic and sponsored results and attempt to provide answers to the query instead of just pointing to websites that might deliver that information. In this work, we analyze the appearance and evolution of SERP features in the two leading web search engines, Google Search and Microsoft Bing. Using a sample of SERP from the Internet Archive, we analyzed the appearance and evolution of these features. We found that SERP are becoming more diverse in terms of elements, aggregating content from different verticals and including more features that provide direct answers.

How to Make an Outlier? Studying the Effect of Presentational Features on the Outlierness of Items in Product Search Results

  • Fatemeh Sarvi
  • Mohammad Aliannejadi
  • Sebastian Schelter
  • Maarten de Rijke

In two-sided marketplaces, items compete for attention from users since attention translates to revenue for suppliers. Item exposure is an indication of the amount of attention that items receive from users in a ranking. It can be influenced by factors like position bias. Recent work suggests that another phenomenon related to inter-item dependencies may also affect item exposure, viz. outlier items in the ranking. Hence, a deeper understanding of outlier items is crucial to determining an item’s exposure distribution. In this work, we study the impact of different presentational e-commerce features on users’ perception of outlierness of an item in a search result page. Informed by visual search literature, we design a set of crowdsourcing tasks where we compare the observability of three main features, viz. price, star rating, and discount tag. We find that various factors affect item outlierness, namely, visual complexity (e.g., shape, color), discriminative item features, and value range. In particular, we observe that a distinctive visual feature such as a colored discount tag can attract users’ attention much easier than a high price difference, simply because of visual characteristics that are easier to spot. Moreover, we see that the magnitude of deviations in all features affects the task complexity, such that when the similarity between outlier and non-outlier items increases, the task becomes more difficult.

How we Work, Share, and Re-use at CHIIR

  • Toine Bogers
  • Maria GĂ€de
  • Mark Michael Hall
  • Marijn Koolen
  • Vivien Petras
  • Birger Larsen

In this paper, we present the results of an initial study of the research, sharing, and re-use practices at the CHIIR conference through a systematic analysis of all CHIIR papers published from 2016 to 2022. We find that CHIIR is a conference predominantly focused on empirical, multi-methods research that over the years has undergone a focusing in terms of the type of research methods that are being used. A modest number of papers re-use existing data and design resources, but infrastructure component re-use is much more rare. Only a fraction of CHIIR papers actually share their own resources, which suggests that there is much to gain in terms of reproducibility of research presented at CHIIR and could potentially be used to support changes in reviewing practices.

I Don’t Care How Popular You Are! Investigating Popularity Bias in Music Recommendations from a User’s Perspective

  • Bruce Ferwerda
  • Eveline Ingesson
  • Michaela Berndl
  • Markus Schedl

Recommender systems are designed to help us navigate through an abundance of online content. Collaborative filtering (CF) approaches are commonly used to leverage behaviors of others with a similar taste to make predictions for the target user. However, CF is prone to introduce or amplify popularity bias in which popular (often consumed or highly ranked) items are prioritized over less popular items. Many computational metrics of popularity biases — and resulting algorithmic (un)fairness — have been presented. However, it is largely unclear whether these metrics reflect human perception of bias and fairness. We conducted a user study with 170 participants to explore how users perceive recommendation lists created by algorithms with different degrees of popularity bias. Our results show — surprisingly — that popularity biases in recommendation lists are barely observed by users, even when corresponding bias/fairness metrics clearly indicate them.

Mapping dementia caregivers’ comments on social media with evidence-based care strategies for memory loss and confusion

  • Ning Zou
  • Yuelyu Ji
  • Bo Xie
  • Daqing He
  • Zhimeng Luo

Dementia caregivers widely turn to social media for much needed information and support. Prior research on caregivers’ online information exchange has focused on the original questions or posts, without including peer comments in response to those original questions, which does not provide a complete picture of online information exchanges between caregivers and their peers. This paper provides a preliminary analysis of a subset of data from a larger project and suggests how peer comments might match with evidence-based care strategies for memory loss and confusion. A total of 954 peer comments on 114 Reddit posts for dementia memory loss and confusion were collected and mapped with 5 evidence-based strategies generated from 17 care strategies from 3 credible websites. Our results report how well peer comments on Reddit can map to existing evidence-based strategies, and provide preliminary evidence supporting the necessity of providing tailored information based on the patient’s characteristics, stages of disease, and the progression level of memory loss and confusion.

Nested Contexts of Music Information Retrieval: A Framework of Contextual Factors

  • Yuyu Yang
  • Rob Capra

Music listening is heavily influenced by contexts, and contextual factors can shape users’ interaction with music information retrieval (MIR) systems. To better design context-sensitive user experiences in MIR systems, in this paper, we present a review of prior studies on how contexts are associated with user behavior in MIR systems. Contextual factors considered include interaction design, age, personality, time of day, activity, motivation, nationality, etc. Based on the review, we introduce a framework to consider these contextual factors in a consistent and organized way. The framework is adapted from Ingwersen and JĂ€rvelin’s 2006 nested contexts framework, and has four layers: 1) MIR/system contexts that focus on MIR systems themselves, including both hardware and software; 2) situational contexts that describe varied and transient daily situations where users interact with MIR systems; 3) personal contexts that focus on the more stable personal characteristics; and 4) social and cultural contexts that describe the characteristics of users’ environments. We also present an example to illustrate how to systematically analyze user contexts by using the framework. Finally, we discuss several areas for possible future studies.

Quality Conversations and Considerations on Reddit

  • Frans Van Der Sluis
  • Julien Faure
  • Sofie Phutachard Homnual

Whilst individual users seem rather ill-equipped to judge information quality on their own, conversations between users about information quality on sensitive topics are found to be characterized by personal attacks and disputational talk. In this study, we investigate whether users are better able to judge information qualities during conversations on everyday topics instead. We analyze a total of 268 quality and credibility judgements from four discussion threads on Reddit on their positivity and conversational style. Results confirm the occurrence of quality conversations with a constructive, positive style on everyday topics. We conclude that conversations are more constructive when identity concerns are limited.

Representing Tasks with a Graph-Based Method for Supporting Users in Complex Search Tasks

  • Shawon Sarkar
  • Maryam Amirizaniani
  • Chirag Shah

Despite the considerable advancements in modern search systems for assisting users in search tasks of varying types, support for complex tasks that call for multi-round interactions remains challenging. Identifying users’ tasks is essential to understanding their evolving information needs and search goals during search sessions to simulate and achieve real-time adaptive search retrievals; thus, it is a crucial research thrust in interactive information retrieval (IIR). While a series of descriptive and formal models have been proposed to characterize complex information search sessions, only a few focus on leveraging dynamic task features in search personalizations to support users in different task stages in an adaptive fashion. This preliminary study presents a heterogeneous graph neural network model for extracting and representing tasks to better understand users’ interactive search processes by connecting tasks with search interactions. Our approach’s novelty lies in our application of task representation learning, which enables systems to extract hidden task information from users’ search behaviors. The results of our evaluative experiments on TREC Session track data highlight the value of our proposed task representation model and illustrate a promising research direction on task-oriented intelligent systems.

Rethinking Serendipity in Recommender Systems

  • Denis Kotkov
  • Alan Medlar
  • Dorota Glowacka

Recommender systems suggest items, such as movies or books, to users based on their interests. These systems often suggest items that users are either already familiar with or could easily have found on their own without additional assistance. To overcome these problems, recommender systems aim to suggest serendipitous items. While there is a lack of consensus in the recommender systems research community on the definition of serendipity, it is often conceptualized as a complex combination of relevance, novelty and unexpectedness. However, the common understanding and original meaning of serendipity is conceptually broader, requiring serendipitous encounters to be neither novel nor unexpected. Recent work in the social sciences has highlighted the various ways that serendipity can manifest, leading to a more generalized definition of serendipity. We argue that the study of serendipity in recommender systems would benefit from considering items that are serendipitous under this more general definition, giving us a deeper understanding of the item characteristics and behavioral impact of serendipitous recommendations. These findings will help us to better optimize recommender systems for serendipity. In this paper, we explore various definitions of serendipity and propose a novel formalization of what it means for recommendations to be serendipitous. Lastly, we present an experimental design for how serendipity can be measured in a deployed recommender system.

RULKNE: Representing User Knowledge State in Search-as-Learning with Named Entities

  • Dima El Zein
  • Arthur CĂąmara
  • CĂ©lia Da Costa Pereira
  • Andrea Tettamanzi

A reliable representation of the user’s knowledge state during a learning search session is crucial to understand their real information needs. When a search system is aware of such a state, it can adapt the search results and provide greater support for the user’s learning objectives. A common practice to track the user’s knowledge state is to consider the content of the documents they read during their search session(s). However, most current work ignores entity mentions in the documents, which, when linked to knowledge graphs, can be a source of valuable information regarding the user’s knowledge. To fill this gap, we extend RULK—Representing User Knowledge in Search-as-Learning—with entity linking capabilities. The extended framework RULK represents and tracks user knowledge as a collection of such entities. It eventually estimates the user knowledge gain—learning outcome—by measuring the similarity between the represented knowledge and the learning objective. We show that our methods allow for up to 10% improvements when estimating user knowledge gains.

The effect of research video abstract presentation style on viewer comprehension and engagement

  • Alice Li
  • Heather O'Brien
  • Luanne Sinnamon

This study investigated the effect of video abstract (VA) presentation style (slideshow versus animation) on viewer comprehension and user engagement. Video abstracts, short video presentations of journal articles, were selected and used in a randomized between-subjects experiment (N = 290) with Amazon Mechanical Turk (MTurk) participants. The Cognitive Theory of Multimedia Learning (CTML) informed the selection of VAs. Barrett’s Taxonomy of Cognitive and Affective Dimensions of Reading Comprehension (Barrett’s Taxonomy) was used to develop comprehension measures focused on recall and summarization, and user engagement was measured using a questionnaire. The study found that: 1) comprehension outcomes did not vary between slideshow and animation style VAs, 2) animation VAs were perceived to be more engaging than slideshow VAs, and 3) user engagement was weakly negatively correlated with comprehension scores. In other words, animation VAs attracted viewers with their content, but did not lead to increased comprehension. In fact, viewers in the study who reported higher levels of engagement had slightly lower comprehension outcomes.

Untangling Cognitive Processes Underlying Knowledge Work

  • Ginar Santika Niwanputri
  • Elaine Toms
  • Andrew Simpson

In a post-industrial society, the workplace is dominated primarily by Knowledge Work, which is achieved mostly through human cognitive processing, such as analysis, comprehension, evaluation, and decision-making. Many of these processes have limited support from technology in the same way that physical tasks have been enabled through a host of tools from hammers to shovels and hydraulic lifts. To develop a suite of cognitive tools, we first need to understand which processes humans use to complete work tasks. In the past century several classifications (e.g., Blooms) of cognitive processes have emerged, and we assessed their viability as the basis for designing tools that support cognitive work. This study re-used an existing data set composed of interviews of environmental scientists about their core work. While the classification uncovered many instances of cognitive process, the results showed that the existing cognitive process classifications do not provide a sufficiently comprehensive deconstruction of the human cognitive processes; the work is quite simply too abstract to be operational.

Using Data-Prompted Interviews in Interactive Information Retrieval Research: A Reflection on The Study of Self-Efficacy When Learning Using Search

  • Amelia Cole
  • Heather O'Brien

Capturing authentic information behaviors, feelings, and attitudes in natural settings is a challenge in interactive information retrieval research (IIR). Quantitative data collection is useful for understanding IIR at scale. Yet common data collection techniques, such as surveys, lack participants’ reasoning behind their choices. Qualitative data collection, such as think aloud and after protocols, support understanding how people behave, think, and feel immediately following an experience, but may be subject to cognitive biases and are challenging to deliver longitudinally. IIR studies have an opportunity to enhance ecological validity by using mixed and multi-method studies. This paper applies learnings from a longitudinal mixed method study that combines the 'in-situ' benefits of collecting real-time data over time with the benefits of retrospectives. This approach has potential to advance SAL research by providing a contextualised approach to longitudinal data collection and can be used to gain deeper insights into subjective experiences.

“Webcomics Archive? Now I'm Interested”: Comics Readers Seeking Information in Web Archives

  • Linda Berube
  • Stephann Makri
  • Ian Cooke
  • Ernesto Priego
  • Stella Wisdom

There is a longstanding tradition of understanding information needs and interaction behavior across different user groups to inform the design of digital products and services. There is a gap in such research of comics readers, specifically how they seek and interact with the information and interfaces of web-based archives provided by cultural institutions. For example, while information interaction research has now recognized that information-seeking for leisure and pleasure are important domains of study - consuming information based in fiction can help us escape to exciting worlds by captivating narratives - and while there have been studies of how people find fiction to read, there have to our knowledge been no user-centered studies on how people find and consume digital comics. This exploratory study provides an enriched understanding of the information needs and interaction behaviors of digital comics readers and how that understanding can inform the design of digital platforms to better support them.

SESSION: Demonstrations

A Prototype “Debugger” for Search Strategies

  • Jingfan Zang
  • Tony Russell-Rose

Knowledge workers such as healthcare information professionals, legal researchers, and librarians need to create and execute search strategies that are effective, efficient and error-free. The traditional solution is to use command-line query builders offered by proprietary database vendors. However, these are based on an archaic approach that offers limited support for the validation and optimisation of their output. Consequently, there are often errors in search strategies reported in the literature that prevent them from being effectively reused or extended. In this paper, we demonstrate a new approach that takes inspiration from software development practice and applies it to the challenge of search strategy formulation. We demonstrate a prototype ‘debugger’ which provides insight into the construction and semantics of search strategies, allowing users to inspect, understand and validate their behaviour and effects. This has the potential to eliminate many sources of error and offers new ways to validate, optimise and re-use search strategies and best practices.

CS-lol: a Dataset of Viewer Comment with Scene in E-sports Live-streaming

  • Junjie H. Xu
  • Yu Nakano
  • Lingrong Kong
  • Kojiro Iizuka

Billions of live-streaming viewers share their opinions on scenes they are watching in real-time and interact with the event, commentators as well as other viewers via text comments. Thus, there is necessary to explore viewers’ comments with scenes in E-sport live-streaming events. In this paper, we developed CS-lol, a new large-scale dataset containing comments from viewers paired with descriptions of game scenes in E-sports live-streaming. Moreover, we propose a task, namely viewer comment retrieval, to retrieve the viewer comments for the scene of the live-streaming event. Results on a series of baseline retrieval methods derived from typical IR evaluation methods show our task as a challenging task. Finally, we release CS-lol and baseline implementation to the research community as a resource.

DisETrac: Distributed Eye-Tracking for Online Collaboration

  • Bhanuka Mahanama
  • Mohan Sunkara
  • Vikas Ashok
  • Sampath Jayarathna

Coordinating viewpoints with another person during a collaborative task can provide informative cues on human behavior. Despite the massive shift of collaborative spaces into virtual environments, versatile setups that enable eye-tracking in an online collaborative environment (distributed eye-tracking) remain unexplored. In this study, we present DisETrac- a versatile setup for eye-tracking in online collaborations. Further, we demonstrate and evaluate the utility of DisETrac through a user study. Finally, we discuss the implications of our results for future improvements. Our results indicate promising avenue for developing versatile setups for distributed eye-tracking.

Drag-and-Drop Query Refinement and Query History Visualization for Mobile Exploratory Search

  • Mohammad Hasan Payandeh
  • Miriam Boon
  • Dale Storie
  • Veronica Ramshaw
  • Orland Hoeber

Conducting exploratory searches within digital libraries requires that searchers revise, refine, and reformulate their queries multiple times. Challenges that searchers of digital public libraries face include choosing how to refine their queries and making spelling or typographical errors. These are compounded when using mobile devices, where typing is time-consuming and error-prone. Conducting searches in a mobile context adds yet another challenge: the possibility of being interrupted and losing track of what was being done. In this paper we demonstrate a novel digital public library search interface tuned for mobile device use, which was designed to address these challenges through two key features: drag-and-drop query refinement and query history visualization. This work represents an example of how thoughtful search interface design and the judicious use of visualization techniques can be used to enhance exploratory search processes within digital public libraries.

FairRecKit: A Web-based Analysis Software for Recommender Evaluations

  • Christine Bauer
  • Lennard Chung
  • Aleksej Cornelissen
  • Isabelle van Driessel
  • Diede van der Hoorn
  • Yme de Jong
  • Lan Le
  • Sanaz Najiyan Tabriz
  • Roderick Spaans
  • Casper Thijsen
  • Robert Verbeeten
  • Vos Wesseling
  • Fern Wieland

FairRecKit is a web-based analysis software that supports researchers in performing, analyzing, and understanding recommendation computations. The idea behind FairRecKit is to facilitate the in-depth analysis of recommendation outcomes considering fairness aspects. With (nested) filters on user or item attributes, metrics can easily be compared across user and item subgroups. Further, (nested) filters can be used on the dataset level; this way, recommendation outcomes can be compared across several sub-datasets to analyze for differences considering fairness aspects. The software currently features five datasets, 11 metrics, and 21 recommendation algorithms to be used in computational experimentation. It is open source and developed in a modular manner to facilitate extension. The analysis software consists of two components: A software package (FairRecKitLib) for running recommendation algorithms on the available datasets and a web-based user interface (FairRecKitApp) to start experiments, retrieve results of previous experiments, and analyze details. The application also comes with extensive documentation and options for result customization, which makes for a flexible tool that supports in-depth analysis.

Grep-BiasIR: A Dataset for Investigating Gender Representation Bias in Information Retrieval Results

  • Klara Krieg
  • Emilia Parada-Cabaleiro
  • Gertraud Medicus
  • Oleg Lesota
  • Markus Schedl
  • Navid Rekabsaz

The provided contents by information retrieval (IR) systems can reflect the existing societal biases and stereotypes. Such biases in retrieval results can lead to further establishing and strengthening stereotypes in society and also in the systems. To facilitate the studies of gender bias in the retrieval results of IR systems, we introduce Gender Representation-Bias for Information Retrieval (Grep-BiasIR), a novel thoroughly-audited dataset consisting of 118 bias-sensitive neutral search queries. The set of queries covers a wide range of gender-related topics, for which a biased representation of genders in the search result can be considered as socially problematic. Each query is accompanied with one relevant and one non-relevant document, where the document is also provided in three variations of female, male, and neutral. The dataset is available at https://github.com/KlaraKrieg/GrepBiasIR.

irchiver: A Full-Resolution Personal Web Archive for Users and Researchers

  • Jeff Huang
  • Jing Qian

irchiver is a personal web archive tool for users, which also provided an opportunity for information retrieval researchers to access naturalistic long-term browsing histories. It offers users the ability to capture and search their web archives, which are stored in full-resolution images on their local filesystem. Researchers can request these files, or develop a plugin that triggers when a new image is captured. irchiver runs as a Windows background process using a hook and capture technique that works on all 5 major desktop browsers. The first author has been using irchiver for 12 months, encountering unexpected benefits in archive retrieval, change detection, search functionality, and content recovery.

The Evolution of User Knowledge during Search-as-Learning Sessions: A Benchmark and Baseline

  • Dima El Zein
  • CĂ©lia Da Costa Pereira

In this paper, we present a new benchmark collection that shows how 404 users’ knowledge changed over the course of a search-as-learning session. We estimate the knowledge a user gains from each visited document and monitor knowledge change on a document-by-document basis. We describe the specifics of how this collection was created and provide a use case that illustrates potential future applications.

SESSION: Tutorials

Qualitative Research in Information Interaction

  • Dana McKay
  • Stephann Makri
  • George Robert Buchanan

Qualitative research is a fundamental part of investigating information interactions, but is often challenging to do well. There are contesting theories and approaches, and combining quantitative and qualitative work adds further complexity to qualitative research. In this tutorial, we provide attendees with a comprehensive toolkit of qualitative methods. We will cover a core of flexible, proven and rigorous practice that will give even a novice a thorough grounding in the decisions and approaches to develop high-quality rigorous research. The tutorial leaders come from different backgrounds, and can provide insight into how to adopt qualitative methods whatever your own disciplinary origins. On completion of the tutorial, you will be able to design and execute sound qualitative investigations that lead to high-impact information interaction research.

SESSION: Workshops

Made to Measure: A Workshop on Human-Centred metrics for information seeking

  • George Robert Buchanan
  • Dana McKay
  • Charles Clarke

Measurement is a core tool for improving information interaction. What is measured and how it is measured influences how we perceive the efficacy of retrieval and other interactions. While some measures have become ubiquitous, such as precision and recall, there are many other facets of information interaction we either immature, weakly accepted or only used occasionally. This workshop seeks to develop our metrics for information interaction and retrieval, finding novel measures, or innovative adaptations of existing measures, to create a better experimental toolset to improve our understanding of information interaction, and help develop more effective systems and interactions that support it.

SESSION: Doctoral Consortium

Designing Social Robots to Accommodate Diversity, Equity, and Inclusion in Human-Robot Interaction

  • Jessica Barfield

As humanoid robots interact with people in social contexts, it is important to design robots that meet the information needs and values of a diverse group of users. This is an emerging topic of interest for researchers in information sciences because robots displaying social skills are beginning to assist users in their search for, retrieval, and access to information. However, current robots designed in the United States, Europe, and Asia, may not reflect the values and diversity of the users expected to interact with robots and thus may not be inclusive for Hispanic, Muslim, and other user populations. With this observation in mind, and drawing on Social Identity Theory, my research is aimed at extending past studies on social categorization of individuals to the domain of human-robot interaction and done in the context of information search and retrieval, and other applied tasks. For this relatively unexplored topic of research, I aim to develop guidelines for the design of human-robot interfaces that are inclusive and thus accommodate the diversity of users expected to interact with social robots.

Do Social Media Users Change Their Beliefs to Reflect those Espoused by Other Users?

  • Hmdh Alknjr

Conformity, within the context of social media usage, describes users who change their beliefs to concur with those of others. Despite social conformity being regarded as a widely occurring social phenomenon among social media users, only limited studies are available to date on online cohorts. The aim of this research is to investigate the influence of online conformity and comments posted by other individuals when readers respond to news items on social media. It also aims to explore how other readers can influence the appraisal of the trustworthiness of such news. Additionally, the study investigates key factors that may influence trustworthiness appraisal, along with determining its impact. One of the expected outcomes of this research is that comments or replies shared online can lead to conformity and may assist in changing the reader’s opinion to concur with the opinions espoused by others and may influence their appraisal of news.

Do users trust search engines? And if so, why?: Developing a trust measure and applying it in an experiment

  • Helena HĂ€ußler

Nowadays, users trusting search engines seems a matter of course. However, in the face of critical evaluation of information, the apparent authority of search engines should be investigated. Therefore, the question arises again as to what extent users trust search engines and what their reasons for trusting them are. It is vital to untangle trust, trustworthiness, and trust-related behavior to address this question, a shortcoming of previous studies. The clarification helps to find evidence on the causes and effects of trust. Since there is not an adequate trust measure for technical artifacts to date, it will be developed with the help of a controlled laboratory study and validated with an online questionnaire. The measure will be applied in an online experiment to scenarios from the health and finance domain and the search engines Google and Ecosia. The expected results determine misplaced and legitimate trust in search engines. Consequently, this furthers the discussion among civil society, researchers, and policymakers on the societal consequences of trust, the role of search engines, and the necessary user skills. Additionally, the developed trust measure may be applied to novel AI applications.

How search engine marketing influences user knowledge gain: Development and empirical testing of an information search behavior model

  • Sebastian Schultheiß

People use search engines to find answers to questions related to their health, finances, or other socially relevant issues. However, most users are unaware that search results are considerably influenced by search engine marketing (SEM). SEM measures are driven by commercial, political, or other motives. Due to these motivations, two questions arise: What information quality is mediated through SEM? And how is collecting documents of different quality affecting user knowledge gain? Both questions are not considered by existing models of information behavior. Hence, the doctoral research project described in this paper aims to develop and empirically test an information search behavior model on the influences of SEM on user knowledge gain and thereby contribute to the search as learning body of research.

Investigating Online Browsing Practices

  • Huiwen Zhang

It has been long acknowledged that browsing is an important information seeking strategy that should be well-supported by digital systems. However, a recent evaluation showed that existing digital systems do not meet the basic browsing requirements of users. That motivates this research to improve the existing digital support for browsing. To support browsing in the digital context, we need to know what browsing practices need to be supported. Previous literature illustrated fundamental browsing practices from multiple perspectives. Many empirical studies provided a detailed description of how these fundamental browsing practices happened in the physical context. This research will extend the existing understanding of browsing to the digital context and provide an enhanced foundation for future browsing designs.

Measuring In-Task Emotional Responses to Address Issues in Post-Task Questionnaires

  • Abbas Pirmoradi Bezanjani

When evaluating interactive information retrieval (IIR) interfaces, it is common to collect data using subjective measures such as satisfaction, ease of use, usefulness, and user engagement. However, as these are collected post-task, they serve as surrogate measures for what occurred in the midst of the search activities. Further, such approaches may be subject to recency effects, where the last action in the search process influences the searchers’ opinions about the overall process. With recent improvements in facial emotion classification approaches, we propose that measuring emotional responses may provide a better indication of what is happening throughout search tasks. In this research, we present an approach for collecting real-time emotional responses during a search activity using consumer-grade front-facing cameras and a method of aligning these with search interface feature use. To validate the effectiveness of the approach, we have conducted a controlled laboratory study in which we manipulated the quality of the search results in order to determine if we can detect expected emotional responses, whether search behaviours influencing these emotional responses, and whether recency effects are present in post-task measures. The preliminary results of this study show that our approach is reliable for detecting emotional responses when searchers experience positive and negative emotions throughout the search process, isolate which interactive elements were used when positive and negative emotional responses were experienced, and illustrate how recency effects are present in post-task measures. Our upcoming study will investigate how our approach can be used to evaluate novel search interfaces. We will develop a novel search interface and evaluate it using our approach. Finally, we will create a dashboard to monitor academic literature. Using the same approach, we will demonstrate our approach can extend beyond traditional search interfaces and into more general interface assessment.

Online Information Seeking and Searching Behavior of First Time Southeast Asian Fathers

  • Kidung Ageng

Pregnancy-related information-seeking behavior of fathers is well-researched in developed countries. However, there is limited information on this from developing countries, especially in Southeast Asia. This proposed research study intends to examine the online information-seeking and searching behavior of first-time expectant Southeast Asian fathers when searching for pregnancy information. We explore fathers’ Online Pregnancy-Related Information Seeking Behavior (OPRISB) based on the theoretical framework of information-seeking and searching behavior. The results of this research study could potentially benefit content creators of pregnancy information in designing and presenting pregnancy information aimed at Southeast Asian fathers.

Towards Understanding and Supporting Exploratory Searches

  • Ayah Soufan

Exploratory search is an intuitive concept in interactive information retrieval. It is known that searchers use exploratory search strategies when learning and investigating a new domain. Many definitions for Exploratory Search have been proposed. However, the main dimensions involve high uncertainty with respect to the problem context, the user expertise, and the search process. In this work, we study exploratory search in the literature, and we provide a conceptual model of exploratory search. We also conduct a user study to examine how literature searches are exploratory and what factors influence the exploratory dimensions and characteristics. Moreover, we review the exploratory support tasks, and we try to better design and evaluate exploratory user interfaces.

What tasks emerge from Knowledge Work?

  • Stephanie Carmen Segura-Rodas

Over the 21st century, work has changed from physical labour to knowledge work that is mostly cognitive. To date, very few tools have been developed to facilitate its completion. While much has been written about tasks and many categorisations of tasks have been developed, the work is too broad and lacks discriminatory power. While we may know the task is, for example, to write a report, we do not know what goes on inside the task. Without knowing what tasks are performed at a detailed level, it is difficult to design appropriate tools to support task completion. This research will first identify the key tasks within a particular domain, isolate the various elements and then extract the detailed subtasks. In the second part, these subtasks will be examined in other domains. The research will be conducted primarily using semi-structured interviews, and the results will be thematically analysed. The outcome of this work will be a task structure that identifies the key points in task completion that could be better facilitated by tools.