publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. Adversarial Manipulation of Reasoning Models using Internal Representations
    Kureha Yamaguchi, Benjamin Etheridge, and Andy Arditi
    2025
  2. 2DSig-Detect: a semi-supervised framework for anomaly detection on image data using 2D-signatures
    Xinheng Xie, Kureha Yamaguchi, Margaux Leblanc, Simon Malzard, Varun Chhabra, Victoria Nockles, and Yue Wu
    2025

2024

  1. An AI red team playbook
    Anna Raney, Shiri Bendelac, Keith Manville, Mike Tan, and Kureha Yamaguchi
    https://doi.org/10.1117/12.3021906, Jun 2024
  2. An AI blue team playbook
    Mike Tan, Kureha Yamaguchi, Anna Raney, Victoria Nockles, Margaux Leblanc, and Shiri Bendelac
    https://doi.org/10.1117/12.3021908, Jun 2024