Tutorials will take place on Sunday, 28th July in the Arts Block, Trinity College Dublin.
Full-Day Tutorials | 08:30 – 17:30
Building Test Collections: An Interactive Guide for Students and Others Without their own Evaluation Conference Series
Ian Soboroff (National Institute of Standards and Technology, USA)
Room: 3051, Arts Block, TCD
Abstract:
I will cover the process of building and validating IR test collections. The goal is not for attendees to kick off their own evaluation campaigns, but to enable them to consider whether they may be able to build their own test collections to support their research. At the end of the day, attendees will be familiar with the history of the test collection evaluation paradigm; the design process starting from identifying user tasks and abstracting them; different ways of establishing a document collection; methods for operationalizing relevance; strategies for identifying items in the collection to label, including pooling and sampling; and methods for measuring and validating test collections.
Expected existing knowledge of participants:
Attendees should be familiar with experimental IR methods, particularly measuring a search system using a test collection. Attendees should come prepared with a specific current need for data, and/or details about a collection building effort currently in process; the afternoon session will be devoted to collaboratively discussing these needs and identifying solutions.
Bio:
Dr. Ian Soboroff is the leader of the Retrieval Group at the National Institute of Standards and Technology (NIST). He has co-authored many publications in IR, evaluation, and test collection building. Ian has built test collections for search, filtering, novelty, web, social media, intranet access and other domains. He most recently taught as part of the PROMISE-NoE Winter School on the subject of IR evaluation.
Entity Linking and Retrieval
Edgar Meij (Yahoo! Research), Krisztian Balog (University of Stavanger, Norway) and Daan Odijk (University of Amsterdam, The Netherlands)
Room: 3071, Arts Block, TCD
Abstract:
This full-day tutorial presents a comprehensive introduction to entity linking and retrieval.
Part I provides a detailed overview of entity linking, which addresses identifying and disambiguating entity occurrences in unstructured text. We introduce the fundamental concepts and principles underlying entity linking, and detail state-of-the-art algorithms including unsupervised solutions, graph-based methods, and feature-based approaches in a machine learning setting. We continue with applications of entity linking for IR and conclude this part with a discussion of evaluation methodologies and initiatives in the context of entity linking.
Part II focuses on entity retrieval and begins with a study of scenarios where explicit representations of entities are available in the form of, e.g., Wikipedia pages or RDF triples. We then continue in a setting with more complex queries, requiring evidence to be collected and aggregated from massive volumes of unstructured textual data (with the potential help of some structured data). Such complex queries require a combination of techniques from both entity linking and entity retrieval. Throughout Part II, two main families of models are discussed: generative language models and discriminative feature-based models. Both the entity linking and entity retrieval parts are anchored in recent evaluation efforts conducted at standard benchmarking campaigns such as INEX, TAC, and TREC. We introduce test collections, tasks, evaluation methodology, and experimental results from these evaluation initiatives.
Part III concludes the tutorial with an overview and hands-on comparative analysis of applications and publicly available toolkits and web services.
Expected existing knowledge of participants:
This is an introductory tutorial and no background in entity linking or entity retrieval is required. Basic familiarity with statistics, data-driven approaches to language processing, and programming is assumed. Note that participants will need to bring a laptop for the final, hands-on part.
Bios:
Dr. Edgar Meij is a research scientist at Yahoo! Research. Before this, he was a postdoc at the University of Amsterdam, where he obtained his PhD in Computer Science. His current research focuses on entity linking and semantic search. He regularly teaches at the graduate and post-graduate level. He is a co-organizer of various entity-related NLP and IR workshops, including Reputation 2012 and RepLab.
Dr. Krisztian Balog is an associate professor at the University of Stavanger. His research concerns entity-oriented and semantic search. He has co-organized several SIGIR workshops on entity-oriented search as well as the TREC Entity track, and serves as area chair for IR and Structured Data at SIGIR 2013.
Daan Odijk, MSc is a PhD candidate at the University of Amsterdam under supervision of Prof. Dr. Maarten de Rijke. Daan’s main research interests include information retrieval, text mining, machine learning, and information visualization. His PhD research involves exploring background information through automatic entity linking. He has previously taught courses and specialized tutorials on text mining, collective intelligence, and information visualization.
Half-Day Tutorials - Morning | 08:30 - 12.15
Music Similarity and Retrieval
Peter Knees and Markus Schedl (Johannes Kepler University Linz, Austria)
Room: 4050A, Arts Block, TCD
Abstract:
This tutorial serves as an introductory course to the field of and state-of-the-art in music information retrieval (MIR) and in particular to music similarity estimation which is an essential component of music retrieval. Apart from briefly explaining approaches that estimate similarity based on acoustic properties of an audio signal, we will review methods that exploit (mostly textual) meta-data from the web to build representations of music then used for similarity calculation. Additionally, topics such as (large-scale) music indexing, information extraction for music, personalisation and adaptation in music retrieval, and evaluation of MIR systems will be addressed.
Expected existing knowledge of participants:
None
Bios:
Peter Knees is Assistant Professor of the Department of Computational Perception of Johannes Kepler University Linz where he researches and publishes on music-, multimedia-, and web-IR. Together with Markus Schedl, he is currently preparing a book, “Music Similarity and Retrieval,” to appear in the Springer Information Retrieval series.
Markus Schedl is Assistant Professor of the Department of Computational Perception of Johannes Kepler University Linz. He has (co-)authored more than 70 refereed conference papers and journal articles, predominantly on IR and multimedia. He regularly gives classes on music-IR, data analysis, multimedia retrieval, and social media mining which correspond with his research interests.
The Cluster Hypothesis in Information Retrieval
Oren Kurland (Technion - Israel Institute of Technology, Israel)
Room: 3126, Arts Block, TCD
Abstract:
The cluster hypothesis (van Rijsbergen '79) states that “closely associated documents tend to be relevant to the same requests”. This is one of the most fundamental and influential hypotheses in the information retrieval field. We will survey the different lines of work that the hypothesis has given rise to (e.g., cluster-based retrieval, using topic modeling for retrieval, search results visualization). The survey will be accompanied by an in-depth analysis of the retrieval techniques that are inspired by the cluster hypothesis and which are used for various tasks including ad hoc retrieval, meta-search, microblog (e.g., Twitter) retrieval, query-performance prediction, search-results diversification.
Expected existing knowledge of participants:
This is an intermediate level tutorial. The tutorial is intended for those who are not familiar with the cluster hypothesis and/or research topics and retrieval methods that are motivated by the hypothesis. The prerequisites are familiarity with fundamental retrieval models such as the vector space model and the language modeling framework and basic clustering methods.
Bios:
Oren Kurland is an associate professor at the Technion - Israel Institute of Technology. Oren has published papers covering different aspects of the cluster hypothesis. He serves on the editorial board of the Information Retrieval journal and has served as a senior program committee member in SIGIR and CIKM.
Scalability and Efficiency Challenges in Commercial Web Search Engines
B. Barla Cambazoglu and Ricardo Baeza-Yates (Yahoo! Research Barcelona, Spain)
Room: Swift Theatre, Arts Block, TCD
Abstract:
Commercial web search engines rely on very large compute infrastructures to be able to cope with the continuous growth of the Web and user bases. Achieving scalability and efficiency in such large-scale search engines requires making careful architectural design choices while devising algorithmic performance optimizations. Unfortunately, most details about the internal functioning of commercial web search engines remain undisclosed due to their financial value and the high level of competition in the search market. The main objective of this tutorial is to provide an overview of the fundamental scalability and efficiency challenges in commercial web search engines, bridging the existing gap between the industry and academia.
Expected existing knowledge of participants:
Intermediate
Bios:
Berkant Barla Cambazoglu received his BS, MS, and PhD degrees, all in computer engineering, from the Computer Engineering Department of Bilkent University in 1997, 2000, and 2006, respectively. He is currently employed as a senior researcher in Yahoo! Research. His research interests include information retrieval, web search, and distributed computing.
Ricardo Baeza-Yates is VP of Yahoo! Research for Europe and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile, since 2006. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra in Barcelona, Spain, since 2005.
Searching in the City of Knowledge: Challenges and Recent Developments
Veli Bicer and Vanessa Lopez (Smarter Cities Technology Centre, IBM Research, Dublin, Ireland)
Room: 4050B, Arts Block, TCD
Abstract:
Today plenty of data is emerging from various city systems. Beyond the classical Web resources, large amounts of data are retrieved from sensors, devices, social networks, governmental applications, or service networks. In such a diversity of information, answering specific information needs of city inhabitants requires holistic IR techniques, capable of harnessing different types of city data and turned it into actionable insights to answer different queries. This tutorial will present deep insights, challenges, opportunities and techniques to make heterogeneous city data searchable and show how emerging IR techniques models can be employed to retrieve relevant information for the citizens.
Expected existing knowledge of participants:
None
Bios:
Veli Bicer is a researcher at Smarter Cities Technology Center of IBM Research in Dublin, Ireland. His research interests include semantic data management, semantic search, software engineering and statistical relational learning. He obtained his PhD from Karlsruhe Institute of Technology, Karlsruhe, Germany and B.Sc. and M.Sc. degrees in computer engineering from Middle East Technical University, Ankara, Turkey.
Vanessa Lopez joined IBM Research Ireland as a research engineer in January 2012. Prior to joining IBM, she was a research associate at the Knowledge Media Institute (The Open University, UK). She has been working on the topic of searching and querying heterogeneous semantic data for over eight years, in particular on question answering, ranking and merging.
Half-Day Tutorials - Afternoon | 1:30 - 5:30
Multimedia Recommendation: Technology and Techniques
Jialie Shen (School of Information Systems, Singapore Management University, Singapore), Meng Wang (Hefei University of Technology, China), Shuicheng Yan (National University of Singapore, Singapore) and Peng Cui (Tsinghua University, China)
Room: 4050B, Arts Block, TCD
Abstract:
Due to the rapid growth of online multimedia information, the problem of information overload has become more and more serious in recent decades. To address the issue, various recommendation technologies have been developed by different research communities (e.g., multimedia systems, information retrieval and machine learning). Meanwhile, many commercial Web systems (e.g., Flick, YouTube, and Last.fm) have successfully applied recommendation techniques to provide users personalized multimedia content and services in a convenient and flexible way.
While several tutorials and courses were dedicated to media search and relevant topics in the last few years, to the best of our knowledge, the tutorial should be the pioneering one solely focusing on multimedia recommendation technologies and their applications on various domains and media contents. We will give an overview of multimedia recommender systems and make some predictions about the road that lies ahead for IR researchers. Over long run, we hope that the tutorial provides an impetus for further research on this important topic.
Expected existing knowledge of participants:
None
Bios:
Dr. Jialie Shen is an Assistant Professor in Information Systems, School of Information Systems, Singapore Management University, Singapore. He received his PhD in Computer Science from the University of New South Wales (UNSW), Australia in the area of large-scale media retrieval and database access methods. Dr. Shen's main research interests include information retrieval, multimedia systems and economic-aware media analysis. His recent work has been published or is forthcoming in leading journals and international conferences including ACM SIGIR, ACM Multimedia, ACM SIGMOD, CVPR, ICDE, WWW, IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), IEEE Transactions on Multimedia (IEEE TMM), IEEE Transactions on Image Processing (IEEE TIP), ACM Multimedia Systems Journal, ACM Transactions on Internet Technology (ACM TOIT) and ACM Transactions on Information Systems (ACM TOIS). Besides being chair, PC member, reviewer and guest editor for several leading information systems journals and conferences, he is an associate editor of International Journal of Image and Graphics (IJIG) and area editor of Electronic Commerce Research and Applications (ECRA).
Dr. Meng Wang is a professor in the Hefei University of Technology, China. His current research interests include multimedia content analysis, search, mining, recommendation, and large-scale computing. He has authored more than 100 book chapters, journal and conference papers in these areas. He is an associate editor of Information Sciences and Neurocomputing. He received the best paper awards successively in the 17th and 18th ACM International Conference on Multimedia and the best paper award in the 16th International Multimedia Modeling Conference
Dr. Shuicheng Yan is an Associate Professor in the Department of Electrical and Computer Engineering at National University of Singapore, and the founding lead of the Learning and Vision Research Group (http://www.lv-nus.org). Dr. Yan's research areas include computer vision, multimedia and machine learning, and he has authored or co-authored over 280 technical papers over a wide range of research topics, H-index = 36. He is an associate editor of IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT) and ACM Transactions on Intelligent Systems and Technology (ACM TIST). He received the Best Paper Awards from PCM 2011, ACM MM 2010, ICME 2010 and ICIMCS 2009, the winner prizes of the classification task in both PASCAL VOC 2010 and PASCAL VOC 2011, the honorable mention prize of the detection task in PASCAL VOC 2010, 2010 TCSVT Best Associate Editor (BAE) Award, 2010 Young Faculty Research Award, 2011 Singapore Young Scientist Award, and 2012 NUS Young Researcher Award.
Dr. Peng Cui is an assistant Professor in Department of Computer Science and Technology, Tsinghua University. He received his PhD in Computer Science from Tsinghua University. His research interests include multimedia content analysis, social network analysis, and social multimedia computing. His recent research work has been published in leading conferences and journals, such as IEEE TMM, IEEE TIP, DMKD, SIGIR, AAAI, ICDM etc. He serves as sponsor chair, co-chairs in ACM SIGKDD2012, ACM MM 2011 workshop, and IEEE ICME 2012 special session etc. He is the guest editor or reviewer of many referee journals including Information Retrieval journal, IEEE TMM, IEEE TCSVT, IEEE TKDE, ACM TKDD etc.
Kernel-based Learning to Rank with Syntactic and Semantic Structures
Alessandro Moschitti (Qatar Computing Research Institute, Qatar Foundation, Doha, Qatar and DISI, University of Trento, Italy)
Room: 4050A, Arts Block, TCD
Abstract:
Kernel Methods (KMs) are powerful machine learning techniques that can alleviate the data representation problem as they substitute the scalar product between feature vectors with similarity functions (kernels) directly defined between data instances, e.g., syntactic trees, (thus features are not needed any longer). This tutorial aims at introducing essential and simplified theory of Support Vector Machines and KMs for the design of practical applications. It will describe effective kernels for easily engineering automatic classifiers and learning to rank algorithms using structured data and semantic processing. Some examples will be drawn from Question Answering, Passage Re-ranking, Short and Long Text Categorization, Relation Extraction, Named Entity Recognition, Co-Reference Resolution. In particular, state-of-the-art kernel technology currently encoded in the famous IBM deepQA system, Watson, will be described. Finally, some practical demonstrations will be given using the SVM-Light-TK (tree kernel) toolkit. Tutorial Website
Expected existing knowledge of participants:
Basic knowledge of machine learning and Natural Language Processing.
Bio:
Bio: Alessandro Moschitti is a Senior Research Scientist of QCRI and tenured professor at the Computer Science Department of the University of Trento. He obtained his PhD in Computer Science in the University of Rome in 2003. He has worked as an associate researcher for the University of Texas at Dallas, as a visiting professor for the University of Columbia (NY), the University of Colorado at Boulder, the John Hopkins University and as a visiting researcher at IBM Watson, NY. Dr. Moschitti has been the only European Faculty member to participate in the Jeopardy! challenge. He has significant expertise in both theoretical and applied machine learning (ML), NLP, IR and Data Mining. He has devised innovative kernels for advanced syntactic/semantic processing with support vector and other kernel-based machines. He is an author or co-author of about 170 scientific articles published in major conferences, e.g., ACL, SIGIR, ICDM, ICML, CIKM, ECML, EMNLP, IJCAI, etc., and journals, e.g., Comp. Ling., IPM, DMKD, IS-IEEE, TASL-IEEE, etc. He is/has been an area chair for ACL (for the semantics and the ML tracks), for ECML PKDD and for IJCNLP. Additionally, he has been PC chair of other important conferences and workshops. Currently, he is on the editorial board of JAIR, JNLE and JoDS. He has coordinated (or participated in) seven EC projects, e.g., LivingKnowledge, LiMoSiNE, NAMIC. He has received two IBM Faculty Awards, one Google Faculty Award and several best paper awards (e.g., at ECML PKDD 2012).
Diversity and Novelty Information Retrieval
Rodrygo L.T. Santos (University Federal de Minas Gerais, Brazil), Pablo Castells (University Autonoma de Madrid, Spain), Ismail Sengor Altingovde (Middle East Technical University, Turkey) and Fazli Can (Bilkent University, Turkey)
Room: Swift Theatre, Arts Block, TCD
Abstract:
Through a stream of active research and experiences, diversity and novelty can be said to have by now consolidated into a significant body of techniques, methodologies, theories, and knowledge in the field of information retrieval. This tutorial aims to provide a unifying account of current research on diversity and novelty in different IR domains. In particular, the tutorial will cover the motivations, as well as the most established approaches for producing and evaluating diverse results in the context of search engines, recommender systems, and data streams. By contrasting the state-of the-art in these multiple domains, this tutorial aims to derive a common understanding of the diversification problem and the existing solutions, their commonalities and differences, as a means to foster new research directions.
Expected existing knowledge of participants:
Introductory knowledge in the broad domains of search, recommendation, and data streams, as well as basics of probability.
Bios:
Rodrygo Santos is a research associate at UFMG, Brazil. He is a leading expert in search result diversification, with a PhD thesis and 18 papers on the topic. He is also a frequent participant at evaluation forums dedicated to search result diversification, such as TREC and NTCIR.
Pablo Castells is an associate professor at the Autonoma University of Madrid, Spain. His research experience is focused in the areas of information retrieval, recommender systems, personalization and user modeling. He has led or participated in several national and international projects and has co-authored over 70 journal and conference publications in the aforementioned areas. In recent years his research has focused on diversity, novelty and evaluation in IR and recommender systems, with publications in venues such as RecSys and SIGIR.
Ismail Sengor Altingovde is an assistant professor of Computer Engineering at Middle East Technical University in Ankara, Turkey. His research interests include web IR, with a particular focus on search efficiency, social web and web databases. He recently participated in EU FP7 project “LivingKnowledge” where diversity was among the major themes; and he is currently working on efficiency issues for search result diversification. He has published over 40 papers in prestigious journals (including ACM TODS, TOIS, TWEB, JASIST and IP & M) and conferences (including SIGIR, VLDB, and CIKM).
Fazli Can is a professor of Computer Engineering at Bilkent University in Ankara, Turkey. He has extensively published in IR, data mining, database, computational linguistics, multimedia conferences and journals such as IP & M, JASIST, ACM TOIS, ACM TODS, and Multimedia Tools and Applications. He was one of the two co-editors of ACM SIGIR Forum (1995-2002). He has served on several PCs and he was one of the general co-chairs of the IEEE/ACM ASONAM 2012 Conference. His recent works on topic tracking and novelty detection have been published in JASIST.
Designing Search Usability
Tony Russell-Rose (UXLabs)
Room: 3126, Arts Block, TCD
Abstract:
Search is not just a box and ten blue links. Search is a journey: an exploration where what we encounter along the way changes what we seek. But in order to guide people along this journey, we must understand both the art and science of search usability.
The aim of this tutorial is to deliver a learning experience grounded in good scholarship, integrating the latest research findings with insights derived from the practical experience of designing and optimizing an extensive range of commercial search applications. It focuses on the development of transferable, practical skills that can be learnt and practised within a half-day session.
Expected existing knowledge of participants:
This tutorial is aimed at IR researchers and practitioners, information architects and search specialists interested in the designing more effective user experiences and interfaces for information retrieval and discovery. An awareness of the basic principles of user-centred design is useful (but not essential).
Bio:
Tony Russell-Rose is director of UXLabs, a research and design consultancy specialising in complex search and information access applications. Previously Tony has led R & D teams at Canon, Reuters, HP Labs and BT Labs. He is author of "Designing the Search Experience" (Elsevier, 2012) and publishes widely on IR, HCI and NLP.