Trifork Blog

Deep Learning for Natural Language Processing – Part II

1 Comment

Author – Wilder Rodrigues

Wilder continues his series about NLP.  This time he would like to bring you to the Deep Learning realm, exploring Deep Neural Networks for sentiment analysis.

If you are already familiar with those types of network and know why certain choices are made, you can skip the first section and go straight to the next one.

I promise the decisions I made in terms of train / validation / test split won’t disappoint you. As a matter of fact, training the same models with different sets got me a better result than those achieved by Dr. Jon Krohn, from untapt, in his Live Lessons.

From what I have seen in the last 2 years, I think we all have already been through a lot of explanations about shallow, intermediate and deep neural networks. So, to save us some time, I will avoid revisiting them here. We will dive straight into all the arsenal we will be using throughout this story. However, we won’t just follow a list of things, but instead, we will understand why those things are being used.
Read the rest of this entry »

Posted in: Machine Learning

Deep Learning for Natural Language Processing – Part I

No Comments

Author – Wilder Rodrigues

Nowadays, the task of natural language processing has been made easy with the advancements in neural networks. In the past 30 years, after the last AI Winter, amongst the many papers have been published, some have been in the area of NLP, focusing on a distributed word to vector representations.

The papers in question are listed below (including the famous back-propagation paper that brought life to Neural Networks as we know them):
Read the rest of this entry »

Posted in: Machine Learning

How to manage Database Migrations with Flyway?

No Comments

Joris Kuipers, CTO at Trifork, presented a webinar on some usage patterns for Flyway. You can find the recording on our Trifork YouTube channel.

Tools like Flyway address a common concern for many people, which quickly leads to questions on how to pick a tool and then apply it in the best manner for one’s particular situation.

In this blog post, Joris has summarized the Q&A session – he provides the readers with his answers and ideas for managing database migrations.

1. How does Flyway compare to Liquibase?

When I was choosing a DB schema migration tool 4 or 5 years ago, I’ve looked at both Liquibase and Flyway. In general, I’d say that Flyway is a bit more light-weight: Liquibase addresses some requirements that Flyway explicitly chooses not to support. These include support to define migrations declaratively (e.g. in XML) and then generate the correct SQL DDL statements for your particular DBMS and the support to generate migration rollbacks (countering operations).
Read the rest of this entry »

Posted in: Custom Development

Using Axon with PostgreSQL without TOAST

No Comments

The client I work for at this time is leveraging Axon 3. The events are stored in a PostgreSQL database. PostgreSQL uses a thing called TOAST (The Oversized-Attribute Storage Technique) to store large values.

From the PostgreSQL documentation:

“PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows”

As it happens, in our setup using JPA (Hibernate) to store events, the DomainEventEntry entity has a @Lob annotation on the payload and the metaData fields (via extension of the AbstractEventEntry class):

For PostgreSQL this will result in events that are not easily readable:

SELECT payload FROM domainevententry;

| payload |
| 24153   |

The data type of the payload column of the domainevententry table is OID.

The PostgreSQL JDBC driver obviously knows how to deal with this. The real content is deTOASTed lazily. Using PL/pgSQL it is possible to store a value in a file. But this needs to be done value by value. But when you are debugging your application and want a quick look at the events of your application, this is not a fun route to take.

So we wanted to change the data type in our database to something more human readable. BYTEA for example. Able to store store large values in, yet still readable. As it turned out, a couple changes are needed to get it working.

It took me a while to get all the pieces I needed. Although the solution I present here works for us, perhaps this could not be the most elegant of even the best solution for everyone.
Read the rest of this entry »

Posted in: Java

Kibana Histogram on Day of Week

No Comments

I keep track of my daily commutes to and from the office. One thing I want to know is how the different days of the week are affecting my travel duration. But when indexing all my commutes into Elasticsearch, I can not (out-of-the-box) create a histogram on the day of the week. My first visualization will look like this:

Read the rest of this entry »

Posted in: Elasticsearch

Smart energy consumption insights with Elasticsearch and Machine Learning

1 Comment

At home we have a Youless device which can be used to measure energy consumption. You have to mount it to your energy meter so it can monitor energy consumption. The device then provides energy consumption data via a RESTful api. We can use this api to index energy consumption data into Elasticsearch every minute and then gather energy consumption insights by using Kibana and X-Pack Machine Learning.

The goal of this blog is to give a practical guide how to set up and understand X-Pack Machine Learning, so you can use it in your own projects! After completing this guide, you will have the following up and running:

  • A Complete data pre-processing and ingestion pipeline, based on:
    • Elasticsearch 5.4.0 with ingest node;
    • Httpbeat 3.0.0.
  • An energy consumption dashboard with visualizations, based on:
    • Kibana 5.4.0.
  • Smart energy consumption insights with anomaly detection, based on:
    • Elasticsearch X-Pack Machine Learning.

The following diagram gives an architectural overview of how all components are related to each other:

Read the rest of this entry »

Posted in: Docker | Elasticsearch | Machine Learning

Heterogeneous microservices

1 Comment

Heterogeneous microservices

Microservices architecture is increasingly popular nowadays. One of the promises is flexibility and easier working in larger organizations by reducing the amount of communication and coordination between teams. The thinking is, teams have their own service(s) and don’t depend on other teams, meaning they can work independently, therebty reducing coordination efforts.

Especially with multiple teams and multiple services per team, this can mean there are quite a few services with quite different usage. Different teams can have different technology preferences, for example because they are more familiar with the one or the other. Similar different usage can mean quite different requirements, which might be easier to fulfill with the one or the other technology.

The question i’m going to discuss in this blog post, how free or constrained should technology choices be in such an environment?

Read the rest of this entry »

Posted in: Custom Development

Interview with Sam Newman, author of Building Microservices

1 Comment

After living in Australia for the last five years, the Londoner, and author of the well-received Building Microservices, has returned home to focus on his business as an independent consultant. We caught up via Skype to discuss his upcoming visit to Amsterdam and the tech trends that he is keeping his eye on.

Reading time: Less than 5 minutes

Read the rest of this entry »

Posted in: Knowledge | Newsletter | Training

How to send your Spring Batch Job log messages to a separate file

1 Comment

In one of my current projects we’re developing a web application which also has a couple of dozen batch jobs that perform all sort of tasks at particular times. These jobs produce quite a bit of logging output when they’re run, which is important to see what has happened during a job exactly. What we noticed however, is that the batch logging would make it hard to quickly spot the other logging performed by the application while also running a batch job. In addition to that, it wasn’t always clear in the context of what job a log statement was issued.
To address these issues I came up with a simple solution based on Logback Filters, which I’ll describe in this blog.

Logback Appenders

We’re using Logback as a logging framework. Logback defines the concept of appenders: appenders are responsible for handling the actual log messages emitted by the loggers in the application by writing them to the console, to a file, to a socket, etc.
Many applications define one or more appenders and them simply list them all as part of their root logger section in the logback.xml configuration file:

<configuration scan="true">

  <appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
    <destination>logstash-server</destination>
    <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
  </appender>

  <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>log/server.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
      <fileNamePattern>log/server.%d{yyyy-MM-dd}.log</fileNamePattern>
      <maxHistory>30</maxHistory>
    </rollingPolicy>
    <encoder>
      <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %mdc %-5level %logger{36} - %msg%n</pattern>
    </encoder>
  </appender>
  <root level="info">
    <appender-ref ref="LOGSTASH"/>
    <appender-ref ref="FILE"/>
  </root>

</configuration>

This setup will send all log messages to both of the configured appenders. Read the rest of this entry »

Posted in: DevOps | From The Trenches | Java

Machine Learning: Predicting house prices

No Comments

Recently i have followed an online course on machine learning to understand the current hype better. As with any subject though, only practice makes perfect, so i was looking to apply this new knowledge.

While looking to sell my house i found that would be a nice opportunity: Check if the prices a real estate agents estimates are in line with what the data suggests.

Linear regression algorithm should be a nice algorithm here, this algorithm will try to find the best linear prediction (y = a + bx1 + cx2 ; y = prediction, x1,x2 = variables). So for example this algorithm can estimate a price per square meter floor space or price per square meter of garden. For a more detailed explanation, check out the wikipedia page.

In the Netherlands funda is the main website for selling your house, so i have started by collecting some data, i used data on the 50 houses closest to my house. I’ve excluded apartments to try and limit data to properties similar to my house. For each house i collected the advertised price, usable floor space, lot size, number of (bed)rooms, type of house (row-house, corner-house, or detached) and year of construction (..-1930, 1931-1940, 1941-1950, 1950-1960, etc). These are the (easily available) variables i expected would influence house price the most. Type of house is a categorical variable, to use that in regression I modeled them as several binary (0/1) variables.

As preparation, i checked for relations between the variables using correlation. This showed me that much of the collected data does not seem to affect price: Only the floor space, lot size and number of rooms showed a significant correlation with house price.

For the regression analysis I only used the variables that had a significant correlation. Variables without correlation would not produce meaningful results anyway.

Read the rest of this entry »

Posted in: Custom Development | General | Machine Learning