The paper proposes "Sophia," a simple and scalable second-order optimizer that estimates the diagonal Hessian as a preconditioner to improve the time and cost of training language models, achieving a 2x speed-up compared to Adam in language modeling with GPT-2 models of sizes ranging from 125M to 770M.
A new approach is presented to enhance transformers' capability of handling longer contexts by using a landmark token for each block of input, allowing for seamless integration with specialized data structures, and enabling processing of arbitrary context lengths while retaining flexibility.
The FIT architecture presents an efficient self-attention transformer-based method for high-resolution image understanding and generation tasks that can operate on gigabit-scale data within a memory capacity of 16GB without optimization.
The paper proposes a decentralized solution for safe digital aid-distribution that guarantees privacy of recipients, using tokens (smart cards and smartphones) to provide scalability, strong accountability, and security, in collaboration with the International Committee of the Red Cross.
Large language models struggle to understand programming languages because they lack the same innate sense of semantic invariances and equivariances that human programmers possess, leading to incorrect predictions and a limitation in their ability to handle tasks that deviate from their training data.
This research looks at the concerns faced by African online consumers, with a focus on Ghana, and how their perceived vulnerability to undesired collection and use of personal information can negatively impact their willingness to transact online.
This paper discusses the tradeoff between compilation speed and code quality, particularly in the context of WebAssembly, and explores the design of a new single-pass compiler for a research Wasm engine.
Researchers show how large language models can be trained to make meaningful health inferences from numerical data, such as wearable sensor readings, with few-shot tuning.
A document compiled by PhD students in an academic research lab provides NLP research directions that are rich for exploration, acknowledging that although many NLP tasks have been improved by large language models, there remain open areas for further research.
The Automatic Prompt Engineer (APE) proposes a method for generating and selecting natural language instructions to maximize task performance of large language models (LLMs) without relying on manually crafted prompts, resulting in better or comparable performance to human-generated prompts on most tasks.