Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Storm Ingest

Nathan,

Just to make sure (you may already know this), you also need to buffer the tuples that you write to the batch writer in some sort of list/queue in whatever bolt does the writing. Then you need to periodically flush the batch writer and ack the tuples that you've kept in the list to be included in the flush. Just adding them to the geomesa feature source or writer does not guarantee they get written because of the buffer in the BW. If you need example code I can dig it up. I use a tick tuple to do most of that and then flush if the its been 60s since my last flush or some max tuples per flush is exceeded. It depends on your latency reqs. The key is to generally have the BW settings for max mutation age be less than your storm initiated flush of the BW.

Andrew

On 06/06/2016 01:43 PM, Nathan Mercer wrote:

Thank you Emilio,

 

I am using anchoring and acknowledgements already.

 

However, the USE_PROVIDED_FID hint works. I was not aware of this. With this hint, I don’t need the exactly once guarantee.

 

Cheers,

Nathan

 

From: geomesa-users-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Emilio Lahr-Vivaz
Sent: Monday, June 6, 2016 7:15 AM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] Storm Ingest

 

Hi Nathan,

Have you considered just using anchoring and acknowledgements to track your tuples? You don't need trident for the basics - you might not get 'exactly once', but it should be closer. See:

http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html

Also, in general re-writing a feature to GeoMesa will not cause any problems as long as the feature has not changed. So you should be fine re-writing the occasional feature. Just make sure that you use the geotools hint to set the feature ID to your provided value, otherwise GeoMesa will generate a new feature ID and you will get duplicate entries. See:

http://docs.geotools.org/latest/javadocs/org/geotools/factory/Hints.html#USE_PROVIDED_FID
https://github.com/geotools/geotools/wiki/allow-inserts-to-use-existing-feature-id

Thanks,

Emilio

On 06/03/2016 03:15 PM, Nathan Mercer wrote:

Hi there,

 

I have tried implementing the Storm ingest example and I was successful. I was also able to apply it to my own data which is polygons stored in Shapefiles.

 

However, using Storm and Kafka you only get at least once guarantee which means you may ingest data more than once. And I have actually seen this happen with my implementation.

 

Apparently using Storm’s Trident you are able to get exactly once guarantee. But I have tried and tried and cannot get Trident working. None of the examples I have found online for Trident have a similar use case.

 

Has anybody been able to implement Trident for ingest? Does it even make sense?

 

Thanks,

 

Nathan Mercer


  ­­  


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users

 



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top