Using Shapeless for Data Cleaning in Apache Spark

When it comes to importing data into a BigData infrastructure like Hadoop, Apache Spark is one of the most used tools for ETL jobs. Because input data – in this case CSV – has often invalid values, a data cleaning layer is needed. Most tasks in data cleaning are very specific and therefore need to be implemented depending on your data, but some tasks can be generalized. In this post, I’ll not go into Spark, ETL or BigData in general, but provide one approach to clean null / empty values off a data set. [Read More]

Java Libs in Scala - A bit more Functional

Every Java library can be used in Scala, which is, for me, one of the good parts of the JVM world. But Java libs are mostly object-oriented and not functional, therefore full of side effects and somtimes “ugly” to use in Scala. But there are some approaches how to make Java libs (or their interfaces) more functional, so they can almost be used like a Scala lib. Java 8 Type Conversion Many Java types like Map or List, but also functional types (Java 8) like Optional<T> have Scala pendents. [Read More]

Overcoming Checked Exceptions in Java Lambdas

In Java 8, the long awaited Lambda came to live, making it easy(-er) to do FP in Java. One problem I came across is, that most Java code throws checked exceptions which leads to IMHO ugly try/catch blocks in lambdas: Function<A, B> fun = (a: A) -> { try { // some function call that trows checked exception$ return callFn(a); } catch (Exception e) { // return failure result } }; The Good, the Bad and the Ugly A really simple, but also not really nice option is to wrap thrown exceptions into an unchecked one: [Read More]