Read and share research papers

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

The paper proposes "Sophia," a simple and scalable second-order optimizer that estimates the diagonal Hessian as a preconditioner to improve the time and cost of training language models, achieving a 2x speed-up compared to Adam in language modeling with GPT-2 models of sizes ranging from 125M to 770M.

Landmark Attention: Random-Access Infinite Context Length for Transformers

A new approach is presented to enhance transformers' capability of handling longer contexts by using a landmark token for each block of input, allowing for seamless integration with specialized data structures, and enabling processing of arbitrary context lengths while retaining flexibility.

FIT: Far-reaching Interleaved Transformers

The FIT architecture presents an efficient self-attention transformer-based method for high-resolution image understanding and generation tasks that can operate on gigabit-scale data within a memory capacity of 16GB without optimization.

Not Yet Another Digital ID: Privacy-preserving Humanitarian Aid Distribution

The paper proposes a decentralized solution for safe digital aid-distribution that guarantees privacy of recipients, using tokens (smart cards and smartphones) to provide scalability, strong accountability, and security, in collaboration with the International Committee of the Red Cross.

The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python

Large language models struggle to understand programming languages because they lack the same innate sense of semantic invariances and equivariances that human programmers possess, leading to incorrect predictions and a limitation in their ability to handle tasks that deviate from their training data.

Ghanaian Consumers Online Privacy Concerns: Causes and its Effects on E-Commerce Adoption

This research looks at the concerns faced by African online consumers, with a focus on Ghana, and how their perceived vulnerability to undesired collection and use of personal information can negatively impact their willingness to transact online.

Whose Baseline (compiler) is it anyway?

This paper discusses the tradeoff between compilation speed and code quality, particularly in the context of WebAssembly, and explores the design of a new single-pass compiler for a research Wasm engine.

Large Language Models are Few-Shot Health Learners

Researchers show how large language models can be trained to make meaningful health inferences from numerical data, such as wearable sensor readings, with few-shot tuning.

A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models

A document compiled by PhD students in an academic research lab provides NLP research directions that are rich for exploration, acknowledging that although many NLP tasks have been improved by large language models, there remain open areas for further research.

Large Language Models Are Human-Level Prompt Engineers

The Automatic Prompt Engineer (APE) proposes a method for generating and selecting natural language instructions to maximize task performance of large language models (LLMs) without relying on manually crafted prompts, resulting in better or comparable performance to human-generated prompts on most tasks.