Crowdsourcing for Search Evaluation and Social-Algorithmic Search

Bio | Summary

Bio:

Matthew Lease is an Assistant Professor in the School of Information at the University of Texas at Austin. He earned his Ph.D. in Computer Science from Brown University in 2009. Lease presented a keynote on crowd computing at IJCNLP 2011, received the Modeling Challenge Award at the 2012 International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction, and received a 2012 DARPA Young Faculty Award. He is currently co-organizing the 2nd Annual Crowdsourcing Track for the 2012 NIST Text REtrieval Conference (TREC).
In addition to his past crowdsourcing tutorials with Alonso, Lease has published various related research papers, co-organized crowdsourcing workshops at SIGIR 2009, WSDM 2010, and SIGIR 2010, and created one of the first graduate-level university courses on crowdsourcing.

Omar Alonso is a senior tech lead on Bing's social search team in Mountain View, CA. Alonso presented a keynote on crowdsourcing for information retrieval at CLEF 2011. In addition to his on-going collaboration with Lease, Alonso has published a number of articles on human computation/crowdsourcing and participated in many workshops and meetups. He holds a PhD in Computer Science from the University of California at Davis.

Summary:

Just as cloud computing enables us to utilize vast Internet computing resources on demand and at-scale, crowdsourcing lets us similarly call upon the online crowd to perform human computation tasks on-demand and at-scale. For example, labeled data for system training and evaluation can now be collected faster, cheaper, and easier than ever before, offering a competitive edge for those able to take advantage of it effectively. The newfound ability to integrate human computation alongside algorithms also greatly expands traditional accuracy-time-cost tradeoffs and represents a disruptive shift in design and implementation of computing systems. New hybrid, socio-computational systems can harness the collective intelligence (or wisdom) of the crowd in concert with automation to better tackle large and/or difficult processing tasks.

While Amazon Mechanical Turk (MTurk)'s platform helped launch the crowdsourcing industry six years ago, today‘s crowdsourcing industry boasts a myriad of other vendors as well offering different features and workflow models. Many of these vendors are not well-known, especially in the research community, which stands to gain from greater awareness of and training for these new service providers. And although the IR community already has a rich understanding of systems and user-centered design issues, crowd-based computing represents a significant departure from existing knowledge and practice. How might we innovate design, implementation, and evaluation IR systems in order to effectively incentivize, incorporate, and benefit from crowd participation?

Preliminary Outline
* Introduce crowdsourcing, human computation, and the wisdom of crowds key concepts
* Survey recent “killer apps” which exemplify successful application of crowdsourcing principles
* Survey opportunities and cases of using crowdsourcing and human computation for IR:
- Evaluation: relevance judging, interactive IR studies, collecting log data
- Training: crowd-based active learning (e.g. learning to rank)
- Search: crowd-answering, crowd-verification, crowd-collaborations, searching the real world
* Summarize recent surveys of crowd demographics to inform user studies or worker-design practices
* Discuss participation motives and incentive structures to encourage engagement and quality of output
* Survey quality control and spam detection mechanisms
* Provide practical “how to” guidance for getting started with Amazon‘s Mechanical Turk and other platforms
* Survey the new generation of crowdsourcing platform alternatives and implications for effective design
* Summarize key best practices for achieving efficient, inexpensive, and accurate work using a combination of human-centric practices and sophisticated statistical methods
* Discuss problems and challenges when deploying crowdsourcing and human computation in industrial settings
* Review emerging opportunities, untapped potential, and open challenges for IR crowdsourcing

Attendees will be supplied with a full set of tutorial notes and supporting bibliography.