AVRO Schemas & APIs
AVRO By Generating a Class
AVRO Using Parsers Library
AVRO Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
AVRO - Seriapzation Using Parsers
One can read an Avro schema into a program either by generating a class corresponding to a schema or by using the parsers pbrary. In Avro, data is always stored with its corresponding schema. Therefore, we can always read a schema without code generation.
This chapter describes how to read the schema by using parsers pbrary and to seriapze the data using Avro.
Seriapzation Using Parsers Library
To seriapze the data, we need to read the schema, create data according to the schema, and seriapze the schema using the Avro API. The following procedure seriapzes the data without generating any code −
Step 1
First of all, read the schema from the file. To do so, use Schema.Parser class. This class provides methods to parse the schema in different formats.
Instantiate the Schema.Parser class by passing the file path where the schema is stored.
Schema schema = new Schema.Parser().parse(new File("/path/to/emp.avsc"));
Step 2
Create the object of GenericRecord interface, by instantiating GenericData.Record class as shown below. Pass the above created schema object to its constructor.
GenericRecord e1 = new GenericData.Record(schema);
Step 3
Insert the values in the schema using the put() method of the GenericData class.
e1.put("name", "ramu"); e1.put("id", 001); e1.put("salary",30000); e1.put("age", 25); e1.put("address", "chennai");
Step 4
Create an object of DatumWriter interface using the SpecificDatumWriter class. It converts Java objects into in-memory seriapzed format. The following example instantiates SpecificDatumWriter class object for emp class −
DatumWriter<emp> empDatumWriter = new SpecificDatumWriter<emp>(emp.class);
Step 5
Instantiate DataFileWriter for emp class. This class writes seriapzed records of data conforming to a schema, along with the schema itself, in a file. This class requires the DatumWriter object, as a parameter to the constructor.
DataFileWriter<emp> dataFileWriter = new DataFileWriter<emp>(empDatumWriter);
Step 6
Open a new file to store the data matching to the given schema using create() method. This method requires the schema, and the path of the file where the data is to be stored, as parameters.
In the example given below, schema is passed using getSchema() method and the data file is stored in the path
/home/Hadoop/Avro/seriapzed_file/emp.avro.
empFileWriter.create(e1.getSchema(), new File("/home/Hadoop/Avro/seriapzed_file/emp.avro"));
Step 7
Add all the created records to the file using append( ) method as shown below.
empFileWriter.append(e1); empFileWriter.append(e2); empFileWriter.append(e3);
Example – Seriapzation Using Parsers
The following complete program shows how to seriapze the data using parsers −
import java.io.File; import java.io.IOException; import org.apache.avro.Schema; import org.apache.avro.file.DataFileWriter; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericDatumWriter; import org.apache.avro.generic.GenericRecord; import org.apache.avro.io.DatumWriter; pubpc class Seriap { pubpc static void main(String args[]) throws IOException{ //Instantiating the Schema.Parser class. Schema schema = new Schema.Parser().parse(new File("/home/Hadoop/Avro/schema/emp.avsc")); //Instantiating the GenericRecord class. GenericRecord e1 = new GenericData.Record(schema); //Insert data according to schema e1.put("name", "ramu"); e1.put("id", 001); e1.put("salary",30000); e1.put("age", 25); e1.put("address", "chenni"); GenericRecord e2 = new GenericData.Record(schema); e2.put("name", "rahman"); e2.put("id", 002); e2.put("salary", 35000); e2.put("age", 30); e2.put("address", "Delhi"); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); dataFileWriter.create(schema, new File("/home/Hadoop/Avro_work/without_code_gen/mydata.txt")); dataFileWriter.append(e1); dataFileWriter.append(e2); dataFileWriter.close(); System.out.println(“data successfully seriapzed”); } }
Browse into the directory where the generated code is placed. In this case, at home/Hadoop/Avro_work/without_code_gen.
$ cd home/Hadoop/Avro_work/without_code_gen/
Now copy and save the above program in the file named Seriapze.java. Compile and execute it as shown below −
$ javac Seriapze.java $ java Seriapze
Output
data successfully seriapzed
If you verify the path given in the program, you can find the generated seriapzed file as shown below.
Advertisements