publications | Ekin Akyürek

2024

Preprint

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

E.A., Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, and Jacob Andreas
Preprint

Learning Linear Attention in Polynomial Time

Morris Yau, E.A., Jiayuan Mao, Joshua B. Tenenbaum, Stefanie Jegelka, and Jacob Andreas
Preprint

Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models

Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, and 10 more authors
ICML

In-Context Language Learning: Architectures and Algorithms

E.A., Bailin Wang, Yoon Kim, and Jacob Andreas
ACL

Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability

Afra Feyza Akyürek, E.A., Leshem Choshen, Derry Wijaya, and Jacob Andreas
NAACL

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

Zhaofeng Wu, Linlu Qiu, Alexis Ross, E.A., Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, and Yoon Kim

2023

ACL

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Afra Feyza Akyürek, E.A., Ashwin Kalyan, Peter Clark, Derry Tanti Wijaya, and Niket Tandon

Despite their unprecedented success, even the largest language models make mistakes.Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their outputs. Because human-generated critiques are expensive to obtain, researchers have devised learned critique generators in lieu of human critics while assuming one can train downstream models to utilize generated feedback. However, this approach does not apply to black-box or limited access models such as ChatGPT, as they cannot be fine-tuned. Moreover, in the era of large general-purpose language agents, fine-tuning is neither computationally nor spatially efficient as it results in multiple copies of the network. In this work, we introduce RL4F (Reinforcement Learning for Feedback), a multi-agent collaborative framework where the critique generator is trained to maximize end-task performance of GPT-3, a fixed model more than 200 times its size. RL4F produces critiques that help GPT-3 revise its outputs. We study three datasets for action planning, summarization and alphabetization and show relative improvements up to 10% in multiple text similarity metrics over other learned, retrieval-augmented or prompting-based critique generators.
ICLRNotable 5%

What learning algorithm is in-context learning? Investigations with linear models

E.A., Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou
ICLR

Compositional Semantic Parsing with Large Language Models

Andrew Drozdov, Nathanael Schärli, E.A., Nathan Scales, Xinying Song, Xinyun Chen, Olivier Bousquet, and Denny Zhou
ACLArea Chair Award

LexSym: Compositionality as Lexical Symmetry

E.A., and Jacob Andreas
SIMAX

Backpropagation through Back Substitution with a Backslash

Alan Edelman, E.A., and Yuyang Wang

2022

NeuripsOral

Pre-trained language models for interactive decision-making

Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, E.A., Anima Anandkumar, and 4 more authors
EMNLP

Towards Tracing Knowledge in Language Models Back to the Training Data

E.A., Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, and Kelvin Guu
ICLR

Subspace Regularizers for Few-Shot Class Incremental Learning

Afra Feyza Akyürek, E.A., Derry Wijaya, and Jacob Andreas

2021

ACLOral

Lexicon Learning for Few Shot Sequence Modeling

E.A., and Jacob Andreas
ICLR

Learning to Recombine and Resample Data For Compositional Generalization

E.A., Afra Feyza Akyürek, and Jacob Andreas

2019

TACL

Morphological Analysis Using a Sequence Decoder

E.A., Erenay Dayanık, and Deniz Yuret

We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and to outperform whole-tag models. In addition, generating morphological features as a sequence rather than, for example, an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the-art results in nine languages of different morphological complexity under low-resource, high-resource, and transfer learning settings. We also introduce TrMor2018, a new high-accuracy Turkish morphology data set. Our Morse implementation and the TrMor2018 data set are available online to support future research.1See https://github.com/ai-ku/Morse.jl for a Morse implementation in Julia/Knet (Yuret, 2016) and https://github.com/ai-ku/TrMor2018 for the new Turkish data set.

2018

TEI

Through the Glance Mug: A Familiar Artefact to Support Opportunistic Search in Meetings

Ahmet Börütecene, Idil Bostan, E.A., Alpay Sabuncuoglu, Ilker Temuzkusu, Çaglar Genç, Tilbe Göksun, and Oguzhan Özcan