This is an information extraction OCR-NLP project. We use a random sampled set of 955 sleep study reports (as images in PDF) from University of Texas Medical Branch to develop a data pipeline for ...