Datashake Unique ID


What changed?

We've recently added an exciting new metadata field to the reviews collected from the Review Scraper API - datashake_review_uuid.

This metric applies an immutable, unique alphanumeric identifier to each review collected from any Review Scraper API supported sources. This means that the same review on the same source will have the same datashake_review_uuid - even if it is collected from different jobs or inputs. When a review is updated or edited, the datashake_review_uuid should also remain unchanged.


Why?

The goal of this identifier is to provide an easy way, across all Review Scraper API supported sources, to deduplicate reviews. The datashake_review_uuid provides a source of truth to identify if any two (or more) reviews are the same. If you are attempting to re-sync reviews or hoping to identify updated or deleted reviews, this new identifier can be used to easily match against existing reviews.


Methodology

The datashake_review_uuid was developed by Datashake Data Scientists using a proprietary algorithm that takes into consideration various data fields and review site structures. Some of these components include the review author, date, and source. The metric was tested across over 10 million data points.


Limitations

It is nearly impossible to develop a perfect unique identifier, so it's important to be aware that the metric will not always be 100% accurate. Our datashake_review_uuid methodology prioritizes precision over recall- meaning we'll only assign the same datashake_review_uuid if we're sure that two reviews are the same. This approach ensures that customers using the datashake_review_uuid to deduplicate data won't miss any reviews, with the risk that some of them could be duplicated.

Please share any datashake_review_uuid feedback with us at support@datashake.com so we can incorporate your feedback into continuous improvements.


Frequently Asked Questions


Question #1:

There are various review IDs returned from Review Scraper API. What is the difference?

Answer:

  1. id - this ID is the unique identifier in our data store and iterates upon every new record that is saved. It is a numeric ID applied to every review returned from a Review Scraper API response. Every single review returned from the API will have a unique "id" value, even if the same review is collected as part of different requests.
    • Example: "reviews": [ { "id": 46928


  1. unique_id - this is an external hash that some review websites provide to identify unique reviews. This is not supported for all sources, the format is not standard and can be defined differently across sources. For example, some sources provide updated IDs when a review is modified, and some do not. The unique_id is completely dependent on the external website and not under Datashake's control.
    • Example: "reviews": [ {"unique_id": null

  1. datashake_review_uuid - a Datashake review identifier applied to every review returned from an RSAPI response, which remains unchanged for the same review on the same source.
    • Example: "reviews": [ {"meta_data": "datashake_review_uuid":"9536cf78-af20-32b0-b4fb-c1a6f2ef7d30"}

Question #2:

Does the datashake_review_uuid work across all sources?

Answer:

Yes, it is a mandatory review field for every new call to Review Scraper API across all sources. There are no exceptions here.


Question #3:

Where can I find the datashake_review_uuid?

Answer:

The datashake_review_uuid is included in the reviews meta_data field response from the Review Scraper API get reviews endpoint.


Question #4:

Have old reviews been populated with datashake_review_uuid?

Answer:

No, only reviews collected since February 2024 include the datashake_review_uuid. If you have a critical need to backfill historical data, please reach out to us.


Question #5:

Will duplicate review content within the same site have the same datashake_review_uuid?

Answer:

Yes


Question #6:

Will duplicate review content republished across different sites (syndicated reviews) have the same datashake_review_uuid?

Answer:

No, reviews on different sources will always have different datashake_review_uuids.


Question #7:

How do I know if the same review was crawled twice?

Answer:

The "datashake_review_uuid" will be the same, even though the "id" will be different.


Question #8:

Is the unique_id available across all RSAPI supported sources?

Answer:

No, the unique_id is not supported across all sources, nor is it consistent. This field is provided by some external review sites and is not modified by Datashake at all. If you want a standard, consistent identifier across all sources, you can use the "datashake_review_uuid".

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.