★ TALK ★
Exploring the lay of the LLM detection landscape
Saturday, 24 February
Are LLMs going to upend, or just end the world? Will malevolent AIs spread disinformation and FUD to enslave humanity in a world of fear? Will Roko's Basilisk come to pass? In order to help stay these dramatic end-times, LLM content detectors are here! We can build safe, AI-free zones to limit the digital “noise” that these models can blast out at scale, if only we can reliably detect and classify a content's origin.
This talk does a deep dive into the leading LLM text detectors, both open-source and commercial, and compares them against a number of different datasets. Next, we throw into the mix ZipPy, a novel open-source detector based on code written in the mid-1980s that outperforms the state-of-the-art in a number of dimensions. ZipPy is simple (less than 200 lines of Python), and it codifies the intuition about a core difference between LLMs and humans that no additional amount of data or training cores can overcome—being unique! Using ZipPy we can walk through the features used to differentiate a text's origins and how with a simple, embedded detector we can build a human-centric world where LLMs are used only to help us rather than subvert us.
Jacob is the Head of Labs at Thinkst Applied Research. Prior to that he managed the HW/FW/VMM security team at AWS, and was a Program Manager at DARPA's Information Innovation Office (I2O). At DARPA he managed a cyber security R&D portfolio including the Configuration Security, Transparent Computing, and Cyber Fault-tolerant Attack Recovery programs. Starting his career at Assured Information Security, he led the Computer Architectures group performing bespoke research into low-level systems security and programming languages. Jacob has been a speaker and keynote speaker at conferences around the world, from BlackHat USA, to SysCan, to TROOPERS and many more. When not in front of the computer, he enjoys trail running, volunteering as a firefighter/EMT, and hiking with his family.