We integrate psychological theories and models of human decision making into machine learning systems to predict human decision making in state-of-the-art levels. Focusing on the most fundamental choice task from behavioral economics and using the largest datasets currently available, we study which theories and models, which types of machine learning algorithms and tools, and which methods of integration lead to the best out-of-sample predictions.
Understanding how to deal with model uncertainty is key for building resilient agents that can overcome environments that are unforeseen. My research group has studied for years different approaches that build robust agents that can cope with different types of uncertainties. Robustness means that policies are immune to changes in the environment leading to better real time performance. In a sequence of papers we developed robust reinforcement learning and planning algorithms including scaling up such algorithms, learning the uncertainty set online, adapting quickly to unknown uncertainties, and online adaptation. The main application areas here are energy and transport services.
We consider a reinforcement learning scheme for selecting how and what to transfer in 5G networks. The problem at hand is to decide which bit-rate to use and which channels would yield the best tradeoff in terms of power, performance, and cost. We employ multi-objective, multi-agent reinforcement learning to best decide how to transmit the data. In previous work, we proposed to use multi-armed bandit algorithms that ignore the current channel and agent state (see O. Avner and S. Mannor, Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach, IEEE/ACM Transactions on Networking ( Volume: 27, Issue: 6, Dec. 2019), https://ieeexplore.ieee.org/document/8875003), but in this project we go further and consider the state of the transmission, the real time requirements, and the changing channel.
We consider the potential role of language as a regularizer in reinforcement learning. The objective is to create hierarchical reinforcement learning algorithms that are explainable by design: they use language to describe what they do. The language models can be learned, dictated, imitated, or created. In a paper that appeared in ICML 2019, we introduced Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We showed how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then used these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II.
Modern recommendation platforms have become complex, dynamic eco-systems. Platforms often rely on machine learning models to successfully match users to content, but most methods neglect to account for how they affect user behavior, satisfaction, and well-being of over time. Here we propose a novel dynamical-systems perspective to recommendation that allows to reason about, and control, macro-temporal aspects of recommendation policies as they relate to user behavior.
This project will enable unreliable edge computing nodes to jointly provide a reliable storage service for unpredictable user workloads. Edge systems consists small-scale servers (nodes) at the edge of the network whose root is in the cloud-based datacenter. Their premise is to bring data and computing closer to time-critical applications running on e.g., cellphones and autonomous vehicles. We combine storage redundancy schemes with scalable algorithms for object mapping and request scheduling.
Database schema matching is a challenging task that call for improvement for several decades. Automatic algorithms fail to provide reliable enough results. We use human matching to overcome algorithm failures and vice versa. We refer to human and algorithmic matchers as imperfect matchers with different strengths and weaknesses. We use insights from cognitive research to predict human matchers behavior and identify those who can do better than others. We then merge their responses with algorithmic outcomes and get better results.
Consider a setting where one agent holds private information and would like to use her information to motivate another agent to take some action. When agents’ interests co-incide the answer is easy – disclose the full information. In this project we study the optimal information design when agents’ incentives are mis-aligned.
Improvements in training speed are needed to develop the next generation of deep learning models. To perform such a massive amount of computation in a reasonable time, it is parallelized across multiple GPU cores. Perhaps the most popular parallelization method is to use a large batch of data in each iteration of SGD, so the gradient computation can be performed in parallel on multiple workers. We aim to enable massive parallelization without performance degradation, as commonly observed.
We aim to improve the resource efficiency of deep learning (e.g., energy, bandwidth) for training and inference. Our focus is decreasing the numerical precision of the neural network model is a simple and effective way to improve their resource efficiency. Nearly all recent deep learning related hardware relies heavily on lower precision math. The benefits are a reduction in the memory required to store the neural network, a reduction in chip area, and a drastic improvement in energy efficiency.
Significant research efforts are being invested in improving Deep Neural Networks (DNNs) via various modifications. However, such modifications often cause an unexplained degradation in the generalization performance DNNs to unseen data. Recent findings suggest that this degradation is caused by changes to the hidden algorithmic bias of the training algorithm and model. This bias determines which solution is selected from all solutions which fit the data. We aim to understand and control this algorithmic bias.
Information recorded by service systems (e.g., in the telecommunication, finance, and health sectors) during their operation provides an angle for operational process analysis, commonly referred to as process mining. Here we establish a queueing perspective in process mining to address the online delay prediction problem, which refers to the time that the execution of an activity for a running instance of a service process is delayed due to queueing effects. We develop predictors for waiting-times from event logs recorded by an information system during process execution. Based on large datasets from the telecommunications and financial sectors, our evaluation demonstrate accurate online predictions, which drastically improve over predictors neglecting the queueing perspective.
Consider a group of workers who answered questions, which have a correct yet unknown answers. The workers are heterogenous, they could be ordinary people, trained volunteers, a panel of experts, different computer algorithms, or a mix of all the above. Our approach is based on empirical Bayes methods and the aim is to construct an algorithm that aggregates all workers’ answers to a single output that is close to the unknown truth. (MSc student: Tsviel Ben-Shabat, co-advisor: Reshef Meir)
In many economic design settings, strong assumptions are made about the knowledge of the designer. A canonical example from auction design is assuming perfect knowledge of how bidders’ willingness to pay is distributed. In which settings can we achieve designs with similar guarantees as those under full knowledge, despite knowing only a sample or a first moment of the prior distribution?
The goal of this research is to design classifiers robust to strategic behavior of the agents being classified. Here strategic behavior means incurring some cost in order to improve personal features and thus classification. This improvement can be superficial – i.e., gaming the classifier – or substantial, thus leading to true self-improvement. In the latter case (and only in this case), the robust classifier should actually encourage strategic behavior.
Consider two strategic players, one more informed about the state of the world and the other less informed. How should the more informed side select what data to communicate to the other side, in order to inspire actions that benefit goals like social welfare? Can this be done under constraints such as privacy, limited communication, limited attention span, fairness, etc.?
We develop a fundamentally novel paradigm that seeks to find a simplification of a given POMDP problem, which is computationally easier, while at the same time providing performance guarantees, and ideally, similar levels of performance as the original decision making problem.
Based on this conceptually novel paradigm, we develop approaches that simplify the decision making problem, for example, by resorting to belief simplification or reward function simplification.
We develop approaches for autonomous semantic perception addressing key challenges such as: classification aliasing for certain relative viewpoints between object & camera, localization uncertainty, and epistemic uncertainty of the classifier. Specifically, approaches for computationally efficient probabilistic inference and decision making, are developed, in the context of semantic perception and SLAM. A key component here is a learned viewpoint-dependent classifier model.
Language is a window to the person’s mind and soul. Surprisingly, while few would disagree with this statement, most behavior prediction and analysis models do not consider language usage. We develop models that do exactly this, considering both economics setups (where game theory predictions consider only the numerical incentive of the participants) as well as psychological and psychiatric challenges (e.g. predicting suicide risk in the general population based on social media postings). Our goal is to integrate linguistic signals along with other behavioral and medical signals, and provide better prediction capabilities along with improved understanding of the underlying phenomena.
A fundamental problem of machine and deep learning models in NLP is that of spurious correlations. Such heavily parametrized models often capture data-driven patterns that are correlated with their task variables, but these patterns have little connection to the actual task they are trying to perform.
This, in turn, substantially harms their generalization capacity. We hence develop methods that follow the causal inference methodology for improved model generalization, interpretation, and stability.
Domain adaptation is the problem of adapting an algorithm trained on one domain (training distribution) so that it can effectively process data from other domains (e.g. adapting a sentiment classification algorithm trained on book reviews so that it can perform well on reviews of patient experience in clinics). We consider various very challenging setups of domain adaptation, focusing on setups where very limited resources and knowledge of the target domains are available when training the algorithm.
Effective learning from data requires prior assumptions, referred to as inductive bias. A fundamental question pertains to the source of a ‘good’ inductive bias. One natural way to form such a bias is through lifelong learning, where an agent continually interacts with the world through a sequence of tasks, aiming to improve its performance on future tasks based on the tasks it has encountered so far. We develop a theoretical framework for incremental inductive bias formation, and demonstrate its effectiveness in problems of sequential learning and decision making.