Emily Sheng

Publications

2026

Agarwal, D.*, Sheng, E.*, Atalla, C., Garcia-Gathright, J., Mozannar, H., Washington, H., Chouldechova, A., Barocas, S., & Wallach, H. (2026). AI-assisted systematization for evaluating GenAI systems. paper + this work was also a critical part of ASSERT

2025

Harvey, E., Sheng, E., Blodgett, S. L., Chouldechova, A., Garcia-Gathright, J., Olteanu, A., & Wallach, H. (2025). Understanding and meeting practitioner needs when measuring representational harms caused by LLM-based systems. In Findings of the Association for Computational Linguistics: ACL 2025 (pp. 18423–18440). Association for Computational Linguistics. paper

Corvi, E., Washington, H., Reed, S., Atalla, C., Chouldechova, A., Dow, P. A., Garcia-Gathright, J., Pangakis, N. J., Sheng, E., Vann, D., Vogel, M., & Wallach, H. (2025). Taxonomizing representational harms using speech act theory. In Findings of the Association for Computational Linguistics: ACL 2025 (pp. 3907–3932). Association for Computational Linguistics. paper

Wallach, H., Desai, M., Cooper, A.F., Wang, A., Atalla, C., Barocas, S., Blodgett, S.L., Chouldechova, A., Corvi, E., Dow, P.A., Garcia-Gathright, J., Olteanu, A., Pangakis, N.J., Reed, S., Sheng, E., Vann, D., Vaughan, J.W., Vogel, M., Washington, H., & Jacobs, A.Z. (2025). Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research paper

2024

Chouldechova, A., Atalla, C., Barocas, S., Cooper, A. F., Corvi, E., Dow, P. A., Garcia-Gathright, J., Pangakis, N., Reed, S., Sheng, E., Vann, D., Vogel, M., Washington, H., & Wallach, H. (2024). A shared standard for valid measurement of generative AI systems' capabilities, risks, and impacts. paper

2023

Magooda, A., Helyar, A., Jackson, K., Sullivan, D., Atalla, C., Sheng, E., Vann, D., Edgar, R., Palangi, H., Lutz, R., Kong, H., Yun, V., Kamal, E., Zarfati, F., Wallach, H., Bird, S., & Chen, M. (2023). A framework for automated measurement of responsible AI harms in generative AI applications.paper

Fleisig, E., Amstutz, A., Atalla, C., Blodgett, S. L., Daumé III, H., Olteanu, A., Sheng, E., Vann, D., & Wallach, H. (2023). FairPrism: Evaluating fairness-related harms in text generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. paper

2022

Dev, S.*, Sheng, E.*, Zhao, J.*, Amstutz, A.*, Sun, J., Hou, Y., Sanseverino, M., Kim, J., Peng, N., Chang, K.-W. (2022). On Measures of Biases and Harms in NLP. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing Conference (Findings) (AACL-IJCNLP-Findings 2022). paper

2021

Sheng, E. (Aug 2021). Fairness in Natural Language Generation. PhD Thesis. paper

Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2021). Societal Biases in Language Generation: Progress and Challenges. In Proceedings of the 2021 Conference of the Association for Computational Linguistics (ACL 2021). paper

Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2021). "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021). paper

Sheng, E., Arnold, J., Yu, Z., Chang, K.-W., Natarajan, P., & Peng, N. (2021). Revealing Persona Biases in Dialogue Systems. arXiv preprint arXiv:2104.08728. paper

2020

Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2020). Towards Controllable Biases in Language Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (Findings) (EMNLP-Findings 2020). paper poster

Sheng, E., & Uthus, D. (2020). Investigating Societal Biases in a Poetry Composition System. In Proceedings of the 2nd Gender Bias in NLP Workshop. paper

2019

Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2019). The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019). paper slides

Earlier Works

Sheng, E., & Natarajan, P. (2018). A Byte-sized Approach to Named Entity Recognition. arXiv preprint arXiv:1809.08386. paper

Sheng, E., Miller, S., Ambite, J. L., & Natarajan, P. (2017). A Neural Named Entity Recognition Approach to Biological Entity Identification. In Proceedings of the BioCreative VI Workshop. paper

Sheng, E., Natarajan, P., Gordon, J., & Burns, G. (2017). An Investigation into the Pedagogical Features of Documents. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 109-120). paper

Gordon, J., Aguilar, S., Sheng, E., & Burns, G. (2017). Structured generation of technical reading lists. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 261-270). paper

Emily Sheng (she/her)