Saturday, June 6, 2015

Csv4j - Deserialize CSV Files into Java Objects

Have you ever read CSV files in order to create java objects of some domain type proper for your app? Boring and cumbersome task. And if data is incomplete, e.g. some CSV files have more or less fields than those you care of, it becomes more boring, more cumbersome. It should not be! Java offers the means to do this easily. All you have to do is create a java domain type (class) with the fields of interest and match java fields to CSV header fields. Leave the rest to csv4j!

Brief Summary

Csv4j is a standalone application written in java 8. Optionally annotations are used to match a java field into one or more CSV header fields. In the absence of annotations, csv4j matches java fields to CSV fields of the same name. Then, csv4j makes extensive use of reflection in order to set the proper values to proper fields. 
Basic assumptions follow:
  1. The first line of a CSV input file has to be the header line that defines the fields.
  2. The rest lines contain data and each line corresponds to an instance of a domain type.
  3. A java domain type has to be given as input, with:
    • a public, no-argument constructor;
    • non-final fields to be set with values from CSV data (additional fields are allowed and may or may not be final, it's irrelevant to csv4j);
    • a standard setter method per field that needs to be set;
    • each field's type has to define a static factory "valueOf" method. Primitive types are allowed as they are wrapped by csv4j, well by Guava :)

License and Source Code

Csv4j is an open source project licensed under Apache License, Version 2.0
The code is hosted by github under csv4j repository.
Comments, pull requests or any other sort of contribution is more than welcome.
Maven is used for building and dependency management.

Dependencies

The only csv4j's compile dependency is Guava. It is used for wrapping primitive types into wrapper types (see com.google.common.primitives.Primitives) and also for checking method preconditions. 
For testing, csv4j uses TestNG.

Release

Csv4j is released in the central maven repository. Add it as a dependency in your project with maven:

or gradle:

Use Cases

Use case 1

Let's say you have the 3 following CSV files.

Also let's say that the fields that are relevant for your app are field0, field1 and field2. Notice that only data.csv has the same set of fields. More precisely, additionalField.csv has a field more (i.e. field4), while missingField.csv has a field less (i.e. field1). This is a typical scenario when you cope with incomplete data. Additional fields are ignored by csv4j, while for missing fields the setter method is never called (consider a default value when declaring the field, if you wish).
You would model this info in a domain type, let's call it SimpleDomainType, as follows:

Mapping each CSV file into a list of SimpleDomainType objects can be done by:

Find the complete code for this example at SimpleDomainType.java and HydratorTest.java (simpleDomainType test).

Use case 2


Java fields are not restricted to instances of String and primitive types, as shown in use case 1, but can be of any type T as long as it defines a static factory method valueOf: String => T. This is needed to map the string read from the CSV file into an instance of type T. Fields field0 and field2 in use case 1 were auto-boxed into Integer, Double respectively, which both define a proper valueOf method. To make this clear you could write another domain type, let's call it ComplexDomainType as follows:

where MyInt is defined as follows:

The following code maps a CSV file into a list of ComplexDomainType objects:

Find the complete code for this example at ComplexDomainType.java and HydratorTest.java (complexDomainType test).

Use case 3

 

Consider now that you have the following 2 CSV files:

And let's say you have a domain type AnnotatedDomainType with fields:
  •  int field0;
  •  String att1;
  •  double att2;
Obviously java field field0 will be matched with CSV field field0. You want to match java field att2 with CSV field2. Finally, you know that both field1 in data.csv and field3 in data2.csv refer to the same entity, so you want java field att1 to match both field1 and field3. You can achieve this by defining AnnotatedDomainType as follows:

The following code maps a CSV file into a list of AnnotatedDomainType objects:
 
The current assumption is that each CSV file contains either field1 or field3. If a file contains both, setAtt1 method will be called twice and att1 will finally have the value of the last CSV field in the order they appear in the CSV header line.
Find the complete code for this example at AnnotatedDomainType.java and HydratorTest.java (annotatedDomainType test).

Outro

That is it, I wish you enjoy csv4j as much as I do. Until the next post have fun and take a look at Sane.java, but that deserves a separate post. Stay tuned.

No comments:

Post a Comment