Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Published on July 26, 2023 5:02 PM GMTSummary: Many proposed AGI alignment procedures involve taking a pretrained model and traini...
AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu
Published on July 27, 2023 1:50 AM GMTGoogle Podcasts link Is there some way we can detect bad behaviour in our AI system without...
AXRP Episode 24 - Superalignment with Jan Leike
Published on July 27, 2023 4:00 AM GMTGoogle Podcasts link Recently, OpenAI made a splash by announcing a new “Superalignment” te...
Big AI Won’t Stop Election Deepfakes With Watermarks
Experts warn of a new age of AI-driven disinformation. A voluntary agreement brokered by the White House doesn’t go nearly far eno...
Hollywood’s Strikes Will Disrupt Podcasts, Games, and TikTok Too
This week, we talk about how the changes in Hollywood fueling the writers’ and actors’ strikes will reach beyond TV and movies to ...
To Watermark AI, It Needs Its Own Alphabet
It's getting harder to distinguish between AI- and human-generated content. But Unicode presents an elegant hack in the race to wa...
Mech Interp Puzzle 2: Word2Vec Style Embeddings
Published on July 28, 2023 12:50 AM GMTCode can be found here. No prior knowledge of mech interp or language models is required to...
Reducing sycophancy and improving honesty via activation steering
Published on July 28, 2023 2:46 AM GMTProduced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, unde...
When can we trust model evaluations?
Published on July 28, 2023 7:42 PM GMTThanks to Joe Carlsmith, Paul Christiano, Richard Ngo, Kate Woolverton, and Ansh Radhakrishn...
Open Problems and Fundamental Limitations of RLHF
Published on July 31, 2023 3:31 PM GMTReinforcement Learning from Human Feedback (RLHF) has emerged as the central alignment techn...
Watermarking considered overrated?
Published on July 31, 2023 9:36 PM GMTStatus: a slightly-edited copy-paste of a Twitter X thread I quickly dashed off a week or so...
Thoughts on sharing information about language model capabilities
Published on July 31, 2023 4:04 PM GMTCore claimI believe that sharing information about the capabilities and limits of existing M...
The “no sandbagging on checkable tasks” hypothesis
Published on July 31, 2023 11:06 PM GMT(This post is inspired by Carl Shulman’s recent podcast with Dwarkesh Patel, which I highly...
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Published on August 1, 2023 6:30 PM GMTBlogpost versionPaperWe have just released our first public report. It introduces methodolo...
AI in Medical Imaging Market Report Based on Development, Scope, Share, Trends, Forecast to 2029
The Global AI in Medical Imaging Market was valued at USD 1.12 billion in 2022 and is projected to reach USD 27.52 billion by 2029...
Water Flosser Market is expected to reach USD 1178 Mn by 2029
Global Water Flosser Market size was valued at USD 815.4 Mn in 2022 and is expected to reach USD 1178 Mn by 2029, at a CAGR of 5.4...
China state lenders lower dollar deposit rates for second time in a month - sources
China state lenders lower dollar deposit rates for second time in a month - sourcesAlibaba exploring options for video platforms Youku, Tudou -Bloomberg News
Alibaba exploring options for video platforms Youku, Tudou -Bloomberg NewsMeta loses as top EU court backs antitrust regulators over privacy breach checks
Meta loses as top EU court backs antitrust regulators over privacy breach checksChinese smartphones sales exceed 70% of Russian market
Chinese smartphones sales exceed 70% of Russian marketSouth Korea, Taiwan see limited impact of China curbs on chipmaking materials
South Korea, Taiwan see limited impact of China curbs on chipmaking materialsMorgan Stanley banker Kim to join Deutsche Bank as Chairman of Asia Pacific M&A -memo
Morgan Stanley banker Kim to join Deutsche Bank as Chairman of Asia Pacific M&A -memoTrans Mountain pipeline expansion likely to send more Canadian oil to US, not Asia
Trans Mountain pipeline expansion likely to send more Canadian oil to US, not AsiaTesla, BYD's China deliveries hit record high in Q2
Tesla, BYD's China deliveries hit record high in Q2Where are strategic materials germanium and gallium produced?
Where are strategic materials germanium and gallium produced?EU concerned over China export controls on metals used in chips
EU concerned over China export controls on metals used in chipsChina's SAIC doubles down on European expansion with EV plant plan
China's SAIC doubles down on European expansion with EV plant planGerman industry urges reduced dependency after China export controls
German industry urges reduced dependency after China export controlsChinese hedge fund Dantai to liquidate flagship fund
Chinese hedge fund Dantai to liquidate flagship fundVietnam PM calls for looser monetary policies to fuel growth
Vietnam PM calls for looser monetary policies to fuel growthIndia's Samvardhana Motherson to buy majority stake in Honda's unit
India's Samvardhana Motherson to buy majority stake in Honda's unitAirbus trials new wing designs in technology race with Boeing
Airbus trials new wing designs in technology race with BoeingApple loses London appeal in 4G patent dispute with Optis
Apple loses London appeal in 4G patent dispute with OptisShein in talks with banks and exchanges about US IPO: Report
Shein in talks with banks and exchanges about US IPO: ReportUK financial services minister urges caution over central bank digital currency
UK financial services minister urges caution over central bank digital currencyEU Commission revamps procedures to speed up Big Tech privacy probes
EU Commission revamps procedures to speed up Big Tech privacy probesING Groep suing Chinese copper trader He Jinbi over unpaid debt - Bloomberg
ING Groep suing Chinese copper trader He Jinbi over unpaid debt - BloombergFunction calling and other API updates
We’re announcing updates including more steerable API models, function calling capabilities, longer context, and lower prices.
Introducing OpenAI London
We are excited to announce OpenAI’s first international expansion with a new office in London, United Kingdom.
Insights from global conversations
We are sharing what we learned from our conversations across 22 countries, and how we will be incorporating those insights moving ...