Simple is better than complex
Nowadays, data storage cost is cheap ( almost free), but data bandwidth is costly( thinking about monthly fees paid towards smart phone data transfer and house internet access). As more and more things ( watch, refrigerator , car, etc) are connected to each other through internet, the online traffic could be notorious analogy to freeway 405 in Great Los Angeles. Apache Avro can be helpful on this situation with its compact binary data representation since avro data size is much less verbose than text-based XML or JSON.
In another blog, I mentioned the usage of Apache Avro schema to model data as replacement of XML and JSON. After that, I developed AvroHubTools to author Avro schema by Apache Avro IDL. In this blog, I will try to demonstrate employing Avro IDL to generate Avro schema in conjunction with AvroHubTools, in below topics:
- How to encapsulate Avro schemas and build standalone scheme out of multiple, external and independent Avro schema files.
- How to reuse Avro schemas through inheritance.
- How to apply polymorphism in data modeling by Avro schemas.
AvroHubTools ( obf-avro-hub-tools-1.0-SNAPSHOT.jar) can be downloaded from this link and has been tested with JDK 1.6.45 and Apache Avro 1.7.7 release. AvroHubTools can support other Apache Avro versions, please make request by leaving comments on this blog.
Encapsulating ( data hiding) with Apache Avro Schemas
Encapsulation in here means the packing of date into a single Avro Schema. Let’s assume ones like to model data in Avro scheme about baseball team. We get player, play position, coach as components , as well as name since player and coach reference name component. A play can be assigned to multiple coaches. baseball.avdl Avro IDL to pack four Avro schema files in order to generate Team.avsc
baseball.avdl is defined as below.
@namespace("avro.examples.baseball")
protocol Baseball {
import schema "name.avsc";
import schema "coache.avsc";
import schema "position.avsc";
import schema "player.avsc";
record Team {
Player player;
Coache coache;
}
}
Below are team Avro schema components: name.avsc, position.avsc, coache.avsc and play.avsc
name.avsc
{"type":"record", "name":"Name", "namespace": "avro.examples.baseball",
"fields": [
{"name": "first_name", "type": "string"},
{"name": "last_name", "type": "string"}
]
}
position.avsc
{"type":"enum", "name": "Position", "namespace": "avro.examples.baseball",
"symbols": ["P", "C", "B1", "B2", "B3", "SS", "LF", "CF", "RF", "DH"]
}
coache.avsc
{"type":"record", "name":"Coache", "namespace": "avro.examples.baseball",
"fields": [
{"name": "name", "type": "Name"}
]
}
play.avsc
<pre>{"type":"record", "name":"Player", "namespace": "avro.examples.baseball",
"fields": [
{"name": "number", "type": "int"},
{"name": "name", "type": "Name"},
{"name": "positions", "type": {"type": "array", "items": "Position"} },
{"name": "coaches", "type": {"type": "array", "items": "Coache"} }
]
}
To generate Team.avsc, run below line by utilizing AvroHubTools.
java -jar obf-avro-hub-tools-1.0-SNAPSHOT.jar idlavsc baseball.avdl avro.examples.baseball.Team baseballTeam.avsc
Here is generated baseballTeam.avse
{
"type" : "record",
"name" : "Team",
"namespace" : "avro.examples.baseball",
"fields" : [ {
"name" : "player",
"type" : {
"type" : "record",
"name" : "Player",
"fields" : [ {
"name" : "number",
"type" : "int"
}, {
"name" : "name",
"type" : {
"type" : "record",
"name" : "Name",
"fields" : [ {
"name" : "first_name",
"type" : "string"
}, {
"name" : "last_name",
"type" : "string"
} ]
}
}, {
"name" : "positions",
"type" : {
"type" : "array",
"items" : {
"type" : "enum",
"name" : "Position",
"symbols" : [ "P", "C", "B1", "B2", "B3", "SS", "LF", "CF", "RF", "DH" ]
}
}
}, {
"name" : "coaches",
"type" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "Coache",
"fields" : [ {
"name" : "name",
"type" : "Name"
} ]
}
}
} ]
}
}, {
"name" : "coache",
"type" : "Coache"
} ]
}
To Be Continued……..