Property talk:P3134

Latest comment: 2 years ago by Vicarage in topic Scraping

Documentation

TripAdvisor ID
identifier of a place (region, hotel, restaurant, attraction), in TripAdvisor
Associated itemTripadvisor (Q1770710)
Applicable "stated in" valueTripadvisor (Q1770710)
Data typeExternal identifier
Allowed values[1-9][0-9]{0,7}
Usage notesThe ID is from the URL of a place's page, specifically the digits following the "-d" if the page has one, or following the "-g" if not. Examples where Xs are digits: gXXX-dXXX or Tourism-gXXX.
ExampleSaint Paul hermitage (Q26389271)4037786
Tabriz Bazaar (Q4399)324092
Tambaú Hotel (Q10298625)306312
London (Q84)186338
Machines of the Isle of Nantes (Q1820547)1860780
Sourcehttps://www.tripadvisor.com/
Formatter URLhttps://www.tripadvisor.com/$1
Robot and gadget jobsConvert P553Q1770710 (P554 → $1) into TripAdvisor ID → $1
Tracking: usageCategory:Pages using Wikidata property P3134 (Q29110148)
Related to country  United States of America (Q30) (See 762 others)
See alsoBooking.com hotel ID (P3607), Recreation.gov point of interest ID (P3714), Hotels.com hotel ID (P3898), Expedia hotel ID (P5651)
Lists
Proposal discussionProposal discussion
Current uses
Total28,187
Main statement28,05399.5% of uses
Qualifier5<0.1% of uses
Reference1290.5% of uses
Search for values
[create Create a translatable help page (preferably in English) for this property to be included here]
Format “[1-9][0-9]{0,7}: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P3134#Format, SPARQL
Distinct values: this property likely contains a value that is different from all other items. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P3134#Unique value, SPARQL (every item), SPARQL (by value)
Item “instance of (P31): Items with this property should also have “instance of (P31)”. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P3134#Item P31, search, SPARQL
Item “country (P17): Items with this property should also have “country (P17)”. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303). Known exceptions: Europe (Q46), Antarctica (Q51), Asia (Q48), Africa (Q15), Middle East (Q7204), South America (Q18), South Pacific Ocean (Q12355425)
List of violations of this constraint: Database reports/Constraint violations/P3134#Item P17, search, SPARQL
Single value: this property generally contains a single value. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303). Known exceptions: Chesapeake and Ohio Canal National Historical Park (Q5092894), no label (Q104658109), no label (Q124310978)
List of violations of this constraint: Database reports/Constraint violations/P3134#Single value, SPARQL
Allowed entity types are Wikibase item (Q29934200): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P3134#Entity types
Scope is as main value (Q54828448), as reference (Q54828450): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P3134#Scope, SPARQL
 

Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.)

  Pattern ^g\d+-d(\d+)$ will be automatically replaced to \1.
Testing: TODO list
  Pattern ^Tourism-g(\d+)$ will be automatically replaced to \1.
Testing: TODO list

Warning edit

Hi, I don't know how to avoid warning when I insert this ID. In Q18490819 I tried three different format, they all work but they all give me the small alert icon at the right. I try to understand it (I read the comment) but so far I cannot fix it. I think the problem is that is structured for specific places and not generic locations, that have a shorter code on Tripadvisor. Can someone show to me how to fix it, or change the code here accordingly to host this case too? Thank you.--Alexmar983 (talk) 13:42, 1 August 2018 (UTC)Reply

@YMS: I ping you as a creator if you don't mind. It's not urgent but I am processing a lot of Tripadvisor pages in these weeks, it's he ideal moment to add those ID for town and hamlets but the warning is there.--Alexmar983 (talk) 16:32, 9 August 2018 (UTC)Reply
The second format you tried was almost correct according to the specified format. You just added an additional \, probably inspired from the regular expression given here. But there, the \ is a special character only present to escape the -, which otherwise would have a special meaning. Just copying the beginning of the URL should work fine. The first format you tried actually worked as well, but if I remember it correctly, this is just because TripAdvisor does a lot of fuzzy searching in its URLs (as can be seen in the third format you tried, which also actually worked), and might result in ambiguous links. If I'm wrong here, the regular exception could indeed be simplified. --YMS (talk) 17:07, 9 August 2018 (UTC)Reply

Two items for one ID edit

Hi, this ID covers Orsanmichele (Q860816) and Museum of Orsanmichele (Q2947745). What is the preferred strategy in this case? I apply to both items?--Alexmar983 (talk) 10:34, 10 May 2019 (UTC)Reply

@Alexmar983: IMHO you should create an item about church and museum, add the identifier and then add has part(s) (P527) and part of (P361) properly. --★ → Airon 90 13:00, 20 February 2020 (UTC)Reply

Identifier could be reduced more edit

As of now, identifier is the union of a couple made by a letter and some numbers, linked with a - (e.g. g670770-d4037786. I found out that it is possible to use just the last numbers (https://www.tripadvisor.com/4037786. Moreover, this code is used by TripAdvisor API to get locations. --★ → Airon 90 12:56, 20 February 2020 (UTC)Reply

@Airon90: I would   Support the simplification of this property to just the numbers, also makes sense to align with TripAdvisor themselves and what they consider the unique ID. Would be happy to write a bot to convert existing statements if there is community consensus. --SilentSpike (talk) 11:18, 21 February 2020 (UTC)Reply
@Jura1: Why a new proposal is needed? Why couldn't a bot just change all identifier? --★ → Airon 90 14:57, 23 February 2020 (UTC)Reply
How would a bot find all of them? --- Jura 14:37, 3 March 2020 (UTC)Reply
SELECT DISTINCT ?item ?id where {
  ?item wdt:P3134 ?id.
  FILTER (contains(?id, "-"))
}
LIMIT 100
Try it!
Seems simple enough. --SilentSpike (talk) 15:46, 3 March 2020 (UTC)Reply
It's also worth noting these IDs come in two forms, some are like Tourism-gXXXXX while others are like gXXXXX-dXXXXX. It seems like the former applies to regions and the latter specific destinations (the "g" number is the region and the "d" the destination). Tested and it also looks like the URL found by @Airon90: works for the regions too (e.g. https://www.tripadvisor.com/1025218). --SilentSpike (talk) 17:38, 3 March 2020 (UTC)Reply
That's just the first 100 values here at Wikidata with best rank. --- Jura 17:45, 3 March 2020 (UTC)Reply
Right, because of the LIMIT 100, but bot can go through and update these statements and then they will no longer contain substring "-" so the query will only return the old format. --SilentSpike (talk) 17:56, 3 March 2020 (UTC)Reply
I hope use of this property isn't limited to best rank here at Wikidata. --- Jura 18:21, 3 March 2020 (UTC)Reply
I'm not sure I follow? It's finding the items which have a statement using this property where the value contains a "-" substring. Items don't have a rank. Perhaps I am missing a SPARQL behaviour of some sort? --SilentSpike (talk) 18:35, 3 March 2020 (UTC)Reply
It doesn't find uses in references, as qualifier, in ranks other than best ran, and, most of all, any uses of the property outside Wikidata. --- Jura 18:37, 3 March 2020 (UTC)Reply
Ah I see, then yeah new property proposal and deprecation is the way to go for sure. --SilentSpike (talk) 18:58, 3 March 2020 (UTC)Reply

Am I missing something or bot can change all items in query, then changing the format in format as a regular expression (P1793) will eventually trigger WD:Database reports/Constraint violations/P3134. I don't get what you mean with "property outside Wikidata": if somebody is using data from this property will receive new data which, not only are human usable, like now it is, but also they are machine usable. We are not breaking anything, we are just implementing something better. --★ → Airon 90 19:19, 3 March 2020 (UTC)Reply

How would you update already existing uses of the property (and its values) outside Wikidata? --- Jura 19:24, 3 March 2020 (UTC)Reply
How could you tell people changing property and using a new one because this would be deprecated?
Remember that this change doesn't break anything. Let's suppose it does. A change in the code of programs using this property is required. It would be required even if we create a new property and deprecate this one --★ → Airon 90 19:40, 3 March 2020 (UTC)Reply
@Airon90: The point of deprecation is that you replace a property with another without destroying existing data (only adding data). If we start changing the value of this property everywhere then suddenly downstream there's an unexpected value. If we instead remove the statements and add a new statement with expected new value then nothing is broken, downstream users would just see data as removed and have time to convert to using the new property since the old would be marked as deprecated. For what it's worth I don't imagine it would be hard to get the new proposal approved (since it's pretty clearly the actual identifier unlike the partial URL slug currently used) and it would also still be easy to replace via bot. --SilentSpike (talk) 20:04, 3 March 2020 (UTC)Reply
How can I tell you that this change will not break anything? Nobody will get an "unexpected value".
I'm tired of explaining my position, so if you think again I'm wrong, you will open a new proposal. I won't do that, sorry. --★ → Airon 90 08:03, 4 March 2020 (UTC)Reply

This is now   Done. It was decided to not use a new property by lack of consensus at Wikidata:Property_proposal/TripAdvisor_ID_2. --SilentSpike (talk) 12:15, 2 July 2020 (UTC)Reply

Please add simple instructions on how to get the Trip Advisor ID edit

Hi

I'm trying to add a Trip Advisor ID but there are no instructions here how to get it or what it should look like. Please can these be added?

Thanks

--John Cummings (talk) 10:19, 3 March 2020 (UTC)Reply

@John Cummings: I added instructions, though they may not be great. Basically, given a URL like https://www.tripadvisor.com/Attraction_Review-g303961-d324092-Reviews-Bazaar_of_Tabriz-Tabriz_East_Azerbaijan_Province.html, the ID is the gXXXXXX-dXXXXXX part (where X represents a digit). So for this URL, the ID is g303961-d324092. Basically the parts between tripadvisor.com/ and .html that aren't words. :) Trivialist (talk) 11:07, 3 March 2020 (UTC)Reply
@Trivialist: this is great, thanks so much. John Cummings (talk) 14:12, 3 March 2020 (UTC)Reply

  Just to note for future readers that this discussion is outdated as the ID format has now been updated to just using digits after -d or -g (if there's no -d in the URL). --SilentSpike (talk) 12:13, 2 July 2020 (UTC)Reply

Scraping edit

There are many commercial and open-source scrapers that work on TripAdvisor, but all seem to focus on getting detailed data for a known URL, rather than finding out the urls themselves. I came up with the crude approach which could produce a dataset with manual filtering of incorrect results.

curl 'https://duckduckgo.com/?q=!ducky+fort+amherst+site%3Atripadvisor.com' | sed -Ee 's`.*%2Dd([0-9]*)%2DReviews%2D([^%]*)%2D(.*).html\&.*`\1 \2,\3\n`;s`_` `g'
2225973 Fort Amherst,Chatham Kent England

I will experiment on a small scale for fortifications in Kent. Vicarage (talk) 07:52, 20 March 2022 (UTC)Reply

Return to "P3134" page.