Gareth Rogers | Voxxed Days

Voxxed Days Bristol 2018
on Thursday 25 October

I’m a Data Engineer at Metail where I’ve worked for 6 years. Over the last 4 years I’ve been part of the team first building and then keeping Metail’s data analytics pipeline up-to-date and able to meet our changing demands. This has meant deciding where to keep up with a rapidly changing field and where to enjoy some stability. I came to Metail after graduating with a PhD in high energy physics based on the LHCb experiment at CERN. There I spent too much time working on the control system and monitoring software, but I still managed to code up and version control my analysis. I haven’t really seen a hill since leaving Geneva and I’m hoping to have some time to attempt a run up from the river to Clifton suspension bridge.

See also https://metail.com/

Putting the Spark in Functional Fashion Tech Analytics

Conference

Metail is a fashion tech startup whose goal is to reduce the cost and improve the efficiency of a retailer’s garment photograph process and to give consumers confidence in the clothes they buy online. By allowing customers to try clothes online on their body shape we’ve been able to collect a unique data set of customer cloth shopping habits along with their body shape data.

Metail’s analytics platform, now four years old, drives our data science products, and internal and external dashboards giving summarised view of key business metrics. The pipeline is based on the ideas in Nathan Marz’s lambda architecture and uses the Snowplow analytics pipeline as a foundation for our event tracking, collection and first pass processing. From the start, the pipeline was implemented in Clojure using it to connect our pipeline stages and it’s big data libraries are the workhorse of our raw event processing and aggregation.

This talk will show how we’ve used Clojure to provide a solid platform to connect and manage our AWS hosted analytics pipeline. I aim to convince you Clojure is a strong platform and with it's open source libraries it's a good fit for big data pipelines.