The Next Generation of Neural Networks

8:30-9:30 - July 27, 2020 (GMT+8)

Abstract: The most important unsolved problem with artificial neural networks is how to do unsupervised learning as effectively as the brain. There are currently two main approaches to unsupervised learning. In the first approach, exemplified by BERT and Variational Autoencoders, a deep neural network is used to reconstruct its input. This is problematic for images because the deepest layers of the network need to encode the fine details of the image. An alternative approach, introduced by Becker and Hinton in 1992, is to train two copies of a deep neural network to produce output vectors that have high mutual information when given two different crops of the same image as their inputs. This approach was designed to allow the representations to be untethered from irrelevant details of the input.

The method of optimizing mutual information used by Becker and Hinton was flawed (for a subtle reason that I will explain) so Pacannaro and Hinton replaced it by a discriminative objective in which one vector representation must select a corresponding vector representation from among many alternatives. With faster hardware, contrastive learning of representations has recently become very popular and is proving to be very effective, but it suffers from a major flaw: To learn pairs of representation vectors that have N bits of mutual information we need to contrast the correct corresponding vector with about 2 N incorrect alternatives. I will describe a novel and effective way of dealing with this limitation. I will also show that this leads to a simple way of implementing perceptual learning in cortex.

Geoffrey Hinton received his PhD in Artificial Intelligence from Edinburgh in 1978. After five years as a faculty member at Carnegie-Mellon he became a fellow of the Canadian Institute for Advanced Research and moved to the Department of Computer Science at the University of Toronto where he is now an Emeritus Distinguished Professor. He is also a Vice President & Engineering Fellow at Google and Chief Scientific Adviser of the Vector Institute.

He was one of the researchers who introduced the backpropagation algorithm and the first to use backpropagation for learning word embeddings. His other contributions to neural network research include Boltzmann machines, distributed representations, time-delay neural nets, mixtures of experts, variational learning and deep learning. His research group in Toronto made major breakthroughs in deep learning that revolutionized speech recognition and object classification.


On Presuppositions of Machine Learning: A Meta Theory

14:30-15:30 - July 27, 2020 (GMT+8)

Abstract: Machine learning (ML) has been run and applied by premising a series of presuppositions, which contributes both the great success of AI and the bottleneck of further development of ML. These presuppositions include (i) the independence assumption of loss function on dataset (Hypothesis I); (ii) the large capacity assumption on hypothesis space including solution (Hypothesis II); (iii) the completeness assumption of training data with high quality (Hypothesis III); and (iv) the Euclidean assumption on analysis framework and methodology (Hypothesis IV).

We report, in this presentation, the effort and advances made by my group on how to break through these presuppositions of ML and drive ML development. For Hypothesis I, we introduce the noise modeling principle to adaptively design the loss function of ML, according to the distribution of data samples, which provides then a general way to robustlize any ML implementation. For Hypothesis II, we propose the model driven deep learning approach to define the smallest hypothesis space of deep neural networks (DNN), which yields not only the very efficient deep learning, but also a novel way of DNN design, interpretation and connection with the traditional optimization based approach. For Hypothesis III, we develop the axiomatic curriculum learning framework to learn the patterns from an incomplete dataset step by step and from easy to difficult, which then provides feasible ways to tackle very complex incomplete data sets. Finally, For Hypothesis IV, we introduce Banach space geometry in general, and XU-Roach theorem in particular, as a possibly useful tool to conduct non-Euclidean analysis of ML problems. In each case, we present the idea, principles, application examples and literatures.

  • Zongben Xu Xi’an Jiaotong University & Pazhou Lab, Guangzhou

Zong-Ben Xu, received his PhD degree in Mathematics in 1987 from Xi’an Jiaotong University, China. In 1998, he was a postdoctoral Research Fellow in the Department of Mathematics, The University of Strathclyde.

He worked as a Research Fellow in the Department of Computer Science and Engineering from 1992 to 1994, and 1996 to 1997, at The Chinese University of Hong Kong; a visiting professor in the University of Essex in 2001, and Napoli University in 2002. He has been with the School of Mathematics and Statistics, Xi’an Jiaotong University since 1982, where he served as a professor of mathematics and computer science, Dean of Sciences (1997-2003), VP of the university (2003-2014) and Chief Scientist of National Basic Research Program of China (973 Project). He is currently the director of Pazhou Lab, Guangzhou and the National Lab for Big Data Analytics, Xi’an. He is also the Dean of Xi’an Academy of Mathematics and Mathematical Technology.

Professor Xu makes several important services for government and professional societies currently, including Consultant member of National Big Data Development Commission, the New Generation AI Development Commission and the National Natural Science Foundation of China. He is VP of Industrial and Applied Mathematics Society of China (CSIAM), the director of Big Data and AI Committee of CSIAM. He is also the co-Editor-in-chief of Journal of Big Data Analytics and Textbooks Series on Data Science and Big Data Technology (Higher Education Press of China).

Professor Xu has published over 280 academic papers on nonlinear functional analysis, optimization, machine learning and big data research, most of which are in international journals. His current research interests include mathematical theory and fundamental algorithms for big data Analysis, machine learning and data Science. Professor Xu has gotten many academic awards, say, the National Natural Science Award (2007), the National Scientific and Technological Advance Award (2011) of China, CSIAM Su Buchin Applied Mathematics Award (2008) and Tan Kah Kee Science Award ( in Information Technology Science, 2018) . He delivered a 45 minute talk at the International Congress of Mathematicians (ICM 2010) upon the invitation of the congress committee. He was elected as a member of Chinese Academy of Science in 2011.


Coopetition in IR Research

8:30-9:30 - July 28, 2020 (GMT+8)

Abstract: Coopetitions are activities in which competitors cooperate for a common good. Community evaluations such as the Text REtrieval Conference (TREC) are prototypical examples of coopetitions in information retrieval (IR) and have now been part of the field for almost thirty years. This longevity and the proliferation of shared evaluation tasks suggest that, indeed, the net impact of community evaluations is positive. But what are these benefits, and what are the attendant costs?

This talk will use TREC tracks as case studies to explore the benefits and disadvantages of different evaluation task designs. Coopetitions can improve state-of-the-art effectiveness for a retrieval task by establishing a research cohort and constructing the infrastructure–including problem definition, test collections, scoring metrics, and research methodology–necessary to make progress on the task. They can also facilitate technology transfer and amortize the infrastructure costs. The primary danger of coopetitions is for an entire research community to overfit to some peculiarity of the evaluation task. This risk can be minimized by building multiple test sets and regularly updating the evaluation task.

Ellen Voorhees is a Senior Research Scientist at the US National Institute of Standards and Technology (NIST). Her primary responsibility at NIST is to manage the Text REtrieval Conference (TREC) project, a project that develops the infrastructure required for large-scale evaluation of search engines and other information access technology. Voorhees' research focuses on developing and validating appropriate evaluation schemes to measure system effectiveness for diverse user tasks.

Voorhees received a B.Sc. in computer science from the Pennsylvania State University, and M.Sc. and Ph.D. degrees in computer science from Cornell University. Prior to joining NIST she was a Senior Member of Technical Staff at Siemens Corporate Research in Princeton, NJ where her work on intelligent agents applied to information access resulted in three patents. Voorhees is a fellow of the ACM, a member of AAAI, and has been elected as a fellow of the Washington Academy of Sciences. She has published numerous articles on information retrieval techniques and evaluation methodologies and serves on the review boards of journals and conferences.


Proof by Experimentation? Towards Better IR Research

14:30-15:30 - July 28, 2020 (GMT+8)

Abstract: The current fight against the COVID-19 pandemic illustrates the importance of proper scientific methods: Besides fake news lacking any factual evidence, reports on clinical trials with various drugs often yield contradicting results; here, only a closer look at the underlying empirical methodology can help in forming a clearer picture.

In IR research, empirical foundation in the form of experiments plays an important role. However, the methods applied often are not at the level of scientific standards that hold in many other disciplines, as IR experiments are frequently flawed in several ways: Measures like MRR or ERR are invalid by definition, and MAP is based on unrealistic assumptions about user behaviour; computing relative improvements of arithmetic means is statistical nonsense; test hypotheses often are formulated after the experiment has been carried out; multiple hypotheses are tested without correction; many experiments are not reproducible results or are compared to weak baselines [1, 6]; frequent reuse of the same test collections yields random results [2]; authors (and reviewers) believe that experiments prove the claims made. Methods for overcoming these problems have been pointed out [5], but are still widely ignored.

However, even when experimental results have been achieved via proper methods, this only solves the issue of internal validity. The problem of external validity has hardly been addressed in IR so far. Just having empirical results for a handful or test collections does not enable us to make any statements on how far we can generalise these observations. So we should put more emphasis on understanding why certain methods work (or don’t work) under certain circumstances - instead of looking at improvements at the third or fourth decimal place. This would allow us to make more generally valid statements, and be a first step towards being able to predict performance for new collections [3, 4].

To conclude, better research in IR can only be achieved by

  • Enforcing rigorous experimental methodology at our top venues.
  • Establishing leaderboards and carrying out metastudies for monitoring the actual scientific progress in our field.
  • Understanding should be valued higher than raw performance.
  • Ultimately, research should aim more at performance prediction than at performance measurement.

Norbert Fuhr holds a PhD (Dr.) in Computer Science from the Technical University of Darmstadt, which he received in 1986. He became Associate Professor in the computer science department of the University of Dortmund in 1991 and was appointed Full Professor for computer science at the University of Duisburg-Essen in 2002.

His past research dealt with topics such as probabilistic retrieval models, the integration of IR and databases, retrieval in distributed digital libraries and XML documents, and user friendly retrieval interfaces. His current research interests are models for interactive retrieval, social media retrieval, and evaluation methodology.

Norbert Fuhr has served as PC member and program chair of major conferences in IR and digital libraries, and on the editorial boards of several journals in these areas. In 2012, he received the Gerald Salton Award of ACM-SIGIR.


From Information to Assistance

8:30-9:30 - July 29, 2020 (GMT+8)

Abstract: “Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries.” – Samuel Johnson (Boswell’s Life of Johnson)

When Johnson was writing this, said libraries were very exclusive, inaccessible to most. When I was growing up the library was a favorite place to find information with the help of expert assistants, trained librarians. Nowadays, while libraries are still one of my favorite institutions, we have powerful digital information search, retrieval, and assemblage services, bundled into easily accessible tools at our fingertips.

As information proliferates and human information needs remain high, information retrieval will continue to be a central area of investigation. We will also need better and better tools to access, assemble, and represent that information in ways that can be understood and applied-tools that ensure information turns into knowledge that is useful and used.

In this talk, I will focus on how people find information, and how the tools we build aid in that finding. Using case studies, I will outline some that remain challenging, and offer some case studies and edge cases where more work is needed. I will share thoughts on how emerging assistant devices and services are and are not meeting the challenge of becoming expert information assistants.

Elizabeth Churchill is a Director of UX at Google. She is also the Executive Vice President of the Association of Computing Machinery (ACM), a member of the ACM’s CHI Academy, and an ACM Fellow, Distinguished Scientist, and Distinguished Speaker. With a background in psychology, Artificial Intelligence and Cognitive Science, she draws on social, computer, engineering and data sciences to create innovative digital tools, applications, and services. She has built research teams at Google, eBay, Yahoo, PARC and FujiXerox. She holds a PhD from the University of Cambridge and honorary doctorates from the University of Sussex, and the University of Stockholm. In 2016, she received the Citris-Banatao Institute Athena Award for Executive Leadership.


How Deep Learning Works for Information Retrieval

14:30-15:30 - July 29, 2020 (GMT+8)

Information retrieval (IR) is the science of search, the search of user query relevant pieces of information from a collection of unstructured resources. Information in this context includes text, imagery, audio, video, xml, program, and metadata. The journey of an IR process begins with a user query sent to the IR system which encodes the query, compares the query with the available resources, and returns the most relevant pieces of information. Thus, the system is equipped with the ability to store, retrieve and maintain information.

In the early era of IR, the whole process was completed using handcrafted features and ad-hoc relevance measures. Later, principled frameworks for relevance measure were developed with statistical learning as a basis. Recently, deep learning has proven essential to the introduction of more opportunities to IR. This is because data-driven features combined with data-driven relevance measures can effectively eliminate the human bias in either feature or relevance measure design.

Deep learning has shown its significant potential to transform IR evidenced by abundant empirical results. However, we continue to strive to gain a comprehensive understanding of deep learning. This is done by answering questions such as why deep structures are superior to shallow structures, how skip connections affect a model’s performance, uncovering the potential relationship between some of the hyper-parameters and a model’s performance, and exploring ways to reduce the chance for deep models to be fooled by adversaries. Answering such questions can help design more effective deep models and devise more efficient schemes for model training.

Dacheng Tao is Professor of Computer Science and ARC Laureate Fellow in the School of Computer Science and the Faculty of Engineering, and the Inaugural Director of the UBTECH Sydney Artificial Intelligence Centre, at The University of Sydney. His research results in artificial intelligence have expounded in one monograph and 200+ publications at learning journals and conferences, such as IEEE TPAMI, AAAI, IJCAI, NeurIPS, ICML, CVPR, ICCV, ECCV, ICDM, and KDD, with several best paper awards. He received the 2018 IEEE ICDM Research Contributions Award and the 2015 Australian Museum Scopus-Eureka prize. He is a Fellow of the IEEE, the ACM and the Australian Academy of Science.