[geomesa-users] Ingest data through Spark

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[geomesa-users] Ingest data through Spark

From: Yikai Gong <yikaig@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 13 Aug 2016 16:47:03 +1000
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>

Hi,

May I ask a question about using AccumuloFeatureWriter for ingesting data in Spark. The idea is to make AccumuloFeatureWriter as singleton for each spark executor so that it can be reused and closed at the end.

I am relatively new to Spark and my code (in Java) looks like this:

dataRDD.foreachPartition(iterator -> {

// get or init writer as a static variable in MyTool.class

AppendAccumuloFeatureWriter writer = MyTool.getSingletonWriter(...);

SimpleFeature feature = buildFeature(iterator.next());

while(iterator.hasNext()){

SimpleFeature toWrite = FeatureUtils.copyToWriter(writer, feature, true);

writer.write();

}

});

dataRDD.foreachPartition(iterator -> {

// close writer if it hasn't been closed

MyTool.closeWriter()

}

Problem: A random sets of data may not be saved to Accumulo in this way. I find a solution which is always do writer.flush() after writer.write(), but that is very inefficient.. I wanna ask if there are better ways to use AccumuloFeatureWriter .. and what may cause this weird losing data in writing. Thank you!

Regards

Yikai Gong

Follow-Ups:
- Re: [geomesa-users] Ingest data through Spark
  - From: Jim Hughes

Prev by Date: [geomesa-users] GeoMesa roadmap updates
Next by Date: Re: [geomesa-users] Ingest data through Spark
Previous by thread: [geomesa-users] GeoMesa roadmap updates
Next by thread: Re: [geomesa-users] Ingest data through Spark
Index(es):
- Date
- Thread

Breadcrumbs