Jack Bandy
Clinical Assistant Professor
Department of Computer Science
Contact
Building & Room:
CDRLC 3454
Address:
850 W. Taylor St., Chicago IL 60607
Office Phone:
Email:
Related Sites:
Office Hours
| Sunday | ||
|---|---|---|
| Monday | 10:00am – 11:30am | |
| Tuesday | ||
| Wednesday | 10:00am – 11:30am | |
| Thursday | ||
| Friday | ||
| Saturday |
Selected Publications
Hagar, Nick, and Jack Bandy. “Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl.” In Proceedings of the International AAAI Conference on Web and Social Media, vol. 19, pp. 2454-2464. 2025. https://doi.org/10.1609/icwsm.v19i1.35948
Bandy, Jack. “Problematic machine behavior: A systematic literature review of algorithm audits.” Proceedings of the acm on human-computer interaction 5, no. CSCW1 (2021): 1-34. https://doi.org/10.1145/3449148
Bandy, Jack, and Nicholas Vincent. “Addressing ‘documentation debt’ in machine learning: A retrospective datasheet for bookcorpus.” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2021.
Bandy, Jack, and Nicholas Diakopoulos. “Facebook’s news feed algorithm and the 2020 us election.” Social Media+ Society 9, no. 3 (2023). https://doi.org/10.1177/20563051231196898
Publication Aggregators
Education
Ph.D. in Computer Science & Communication Studies from Northwestern University (2023)
M.S. in Computer Science from University of Kentucky (2018)
B.S. in Computer Science from Wheaton College, IL (2016)
Research Currently in Progress
Most of my time and energy is currently channeled toward teaching. I also participate in several ongoing research projects related to ethics and computing, outlined below. If you are interested in collaborating in some way, please reach out!
- How do different LLM training datasets balance scale and quality?
- How might filtering and preprocessing choices shape model behavior?
- How can dataset diversity and quality be measured and improved according to concrete values?
- What systems could enable large-scale detection of copyrighted, private, and/or non-consensual data?
- (Systems of governance as well as technical systems)
- Data provenance disclosures
Risks and Harms from Large Language Models
- How do LLMs replicate/amplify different forms of bias, discrimination, and/or misrepresentation?
- How do LLMs interact with democratic discourse?
- What ethical frameworks can be used to evaluate safety, accuracy, reliability of model inputs and outputs?
- How do feed algorithms amplify and/or suppress different sources?
- How can the quality of a feed be measured and improved?
- What role do algorithms play in broadcasting low-quality content, misinformation, etc.?
Computer Science Education
- What kinds of interactive projects and exercises are effective for teaching dataset ethics?
- How can practices related to transparency, documentation, and auditing best be incorporated into curricula?
- What helps students and/or practitioners grow in moral and ethical decision-making?