Daniel Cook | Voxxed Days

Voxxed Days Bristol 2018
on Thursday 25 October

Dan is an independent Big Data Technical Architect. He has led the development of a Hadoop-as-a-Service offering and built software streaming frameworks long before the rise of Apache Spark and Storm. He’s well versed in the knowledge domain building applications on RDF and SPARQL collapsing many systems of information into 360 views of a customer from both structured and unstructured data.

Building databases from unstructured text with NLP


It’s easy to take unstructured data and extract entities as well as the relationships between them. Until now it felt that building your own knowledge graph was achievable by only those with a PhD or Masters in Data Science. Truth is we can all get in on the act, this talk shows how.

For many years we’ve tried to get clients from storing their knowledge in unstructured form, at worst on paper to at best in Word documents, to curating it in a more structured form normally with a swanky UI backed by a database thus enabling efficient query and analytics.

These projects take time and money to deliver and if you leave a box on the UI with a large character limit, you can be sure it’ll be filled with yet more unstructured data. Worst case the user goes back to their Word doc.

But there’s another way, let the user carry on using the tools they find familiar and run a pipeline to build the database for query and analytics. Expect practical NLP in this talk without the academia “it’s hard” statements, triplestores, SPARQL and no talk would be complete without saying the word ontology several times.