•An effective way to leverage emerging technologies to solve some common complex problems which are currently being done manually. A cool way to utilize ML/AI along with Natural Language processing to scan large volumes of text. The primary goal is to identify Keywords and frequency of keywords that relate to diseases using tokenization and Lemmatization models. Opensource Models like FuzzyWuzzy logic and Document Frequency Terms (DFT) were extensively used.
•Our Business Operations at Kaiser captures long texts and unstructured data for each research subject (about 6000 to 10000 of them a year) that they work on. This unstructured data is captured in long text fields and sometimes runs into multiple pages for each subject. We created models to scan through volumes of texts, identify Keywords using python Spacy models, and have the models predict the area of and type of study.
•Tools used: Python, SQL, Oracle Cloud Analytics, Oracle Cloud reporting, and Data Science services from Oracle
Speaker: Pratap Madgula and Tony Schollum, Kaiser Permanente