Wikidata:Property proposal/recognition sequence

recognition sequence / cutting site of restriction enzyme / isoschizomer / neoschizomer / isocaudomer / REBASE Enzyme Number edit

recognition sequence edit

Originally proposed at Wikidata:Property proposal/Natural science

DescriptionDNA sequence recognized by a restriction enzyme, DNA binding domain, etc, written from 5' to 3'
Representsrecognition sequence (Q7302658)
Data typeString
Domainrestriction enzyme (Q219715), DNA-binding domain (Q13479514), etc.
Allowed values[ACGTRMWSYKHBDVN]+
Example
Sourcew:List_of_restriction_enzyme_cutting_sites, REBASE

cutting site of restriction enzyme edit

Originally proposed at Wikidata:Property proposal/Natural science

DescriptionDNA cutting site of restriction enzyme, written from 5' to 3'
Data typeString
Domainrestriction enzyme (Q219715)
Allowed values([ACGTRMWSYKHBDVN]*\^[ACGTRMWSYKHBDVN]*|(\(\d+\/\d+\))?[ACGTRMWSYKHBDVN]+\(\d+\/\d+\))
Example
Sourcew:List_of_restriction_enzyme_cutting_sites, REBASE

isoschizomer edit

Originally proposed at Wikidata:Property proposal/Natural science

   Done: isoschizomer (P4873) (Talk and documentation)
Descriptionisoschizomers of the restriction restriction enzyme, which have the same recognition sequence and the cutting site.
Representsisoschizomer (Q644180)
Data typeItem
Domainrestriction enzyme (Q219715)
Allowed valuesrestriction enzyme (Q219715)
Example
Sourcew:List_of_restriction_enzyme_cutting_sites

neoschizomer edit

Originally proposed at Wikidata:Property proposal/Natural science

   Done: neoschizomer (P4875) (Talk and documentation)
Descriptionneoschizomers of the restriction restriction enzyme, which have the same recognition sequence but a different cutting site.
Representsneoschizomer (Q16945915)
Data typeItem
Domainrestriction enzyme (Q219715)
Allowed valuesrestriction enzyme (Q219715)
Example

produces cohesive end edit

Originally proposed at Wikidata:Property proposal/Natural science

Descriptionoverhang DNA sequence generated by restriction enzyme, written from 5' to 3'
Representssticky and blunt ends (Q4859565)
Data typeString
Domainrestriction enzyme (Q219715)
Allowed values([ACGTRMWSYKHBDVN]+\^|\^[ACGTRMWSYKHBDVN]+|\^)
Example

isocaudomer edit

Originally proposed at Wikidata:Property proposal/Natural science

   Done: isocaudomer (P4915) (Talk and documentation)
Descriptionisocaudomer of the restriction restriction enzyme, which have the different recognition sequence but produces the same termini
Representsisocaudomer (Q17000139)
Data typeItem
Domainrestriction enzyme (Q219715)
Allowed valuesrestriction enzyme (Q219715)
Example

REBASE Enzyme Number edit

Originally proposed at Wikidata:Property proposal/Natural science

DescriptionID in REBASE (Restriction Enzyme Database)
RepresentsREBASE (Q7301611)
Data typeExternal identifier
Domainrestriction enzyme (Q219715)
Allowed values[1-9]\d+
Example
Formatter URLhttp://rebase.neb.com/cgi-bin/reb_get.pl?enzname=$1
Motivation

There are many restriction enzymes, which have specific recognition sequences and cutting sites (see w:List_of_restriction_enzyme_cutting_sites), and some enzymes have isoschizomers, neoschizomers or isocaudomer. To introduce these data into Wikidata, new properties are needed.

Expression method for DNA sequence and cutting site is open to some debate. --Okkn (talk) 02:02, 19 February 2017 (UTC)[reply]

  WikiProject Molecular biology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Discussion

  Question That looks good. Only question I have: How many know restriction enzymes are there right now, some 500? And what would be the data source for them? Sebotic (talk) 19:12, 27 February 2017 (UTC)[reply]

@Sebotic: Thank you for making a comment. I don't know how many known restriction enzymes there are, but w:List_of_restriction_enzyme_cutting_sites contains more than 1200 enzymes. First of all, I'll introduce data from this list. And the restriction enzyme database REBASE is available. --Okkn (talk) 01:31, 1 March 2017 (UTC)[reply]

  Notified participants of WikiProject Medicine ChristianKl () 02:54, 12 November 2017 (UTC)[reply]

  WikiProject Molecular biology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. ChristianKl () 18:29, 30 November 2017 (UTC)[reply]

  Question If there are restriction enzymes data in Wikdiata, that's so nice. So I'm supportive. But even if properties have been created, introducing over one thousand restriction enzymes data which have complex value is not easy. Happily, recently I found out that @Okkn: can program and run bot[6]. So I suppose you have a plan to import data by yourself. So my only question is,

  • If these properties were created, would you import over one thousand restriction enzymes data into Wikidata?

Because if Okkn doesn't import, I think no one will import such complex data about over 1K restriction enzymes. --Was a bee (talk) 14:12, 20 February 2018 (UTC)[reply]

@Was a bee: Yes, of course! I can write a code to extract those data from Wikipedia, and I will carry out the work with responsibility. --Okkn (talk) 04:50, 21 February 2018 (UTC)[reply]
@Okkn: That's nice. Then, remaining part is "format" (expression method). If there is no w:de facto standard in this area, I think it is needed to provide readers/users the information about format which is used in Wikidata. For example, if you use w:REBASE format, I think it is needed to link to this page[7] at "Example" section in Template:Property documentation, or writing similar documentation at property talk page, or something like that. Because if there is no information about format, I think readers/users can not handle the data well. --Was a bee (talk) 14:02, 21 February 2018 (UTC)[reply]
@Was a bee: Thanks for your kind advice. I agree that it is important to show the precise information about the format, for both readers and editors. I have supplemented the usage of the properties above. And I will make effort to provide clear and lucid information about format. --Okkn (talk) 15:36, 21 February 2018 (UTC)[reply]

@Okkn, Waas a bee: thanks for this! I have created three of the six proposed properties. For the remaining ones, I would need a clarification about the examples. What are the qids corresponding to "MspI→HpaII" for instance? According to the proposed datatype, both of these strings should correspond to items. Sorry if that is obvious, I don't know anything about this domain! For the format instructions, feel free to use Wikidata usage instructions (P2559) on the properties to explain the usage. − Pintoch (talk) 20:31, 21 February 2018 (UTC)[reply]

@Pintoch: Thank you for creating properties. The remaining three properties are relationships between restriction enzymes so the datatype of those properties are item, but currently we don't have most of the restriction enzymes in Wikidata and they don't have QIDs. I will create items corresponding to both "MspI" and "HpaII" in the near future, with the new properties recognition sequence (P4863), cutting site of restriction enzyme (P4864), and REBASE Enzyme Number (P4866). --Okkn (talk) 22:06, 21 February 2018 (UTC)[reply]
@was a bee: I have created some items as a trial (AaaI (Q49701125) AagI (Q49733183) AasI (Q49734154) AauI (Q49734174) AbaI (Q49734196) AbeI (Q49734216)). Do they seem right? I'd like to hear your opinion! --Okkn (talk) 05:01, 22 February 2018 (UTC)[reply]
@Okkn: It seems nice. But there are two recognition sequence (P4863) at last item AbeI (Q49734216). Is this OK? --Was a bee (talk) 21:06, 22 February 2018 (UTC)[reply]
@Was a bee: That is as I intented because AbeI (Q49734216) is not palindromic. If only one side sequence is stored in Wikidata, we can't write a simple SPAPQL query. --Okkn (talk) 21:56, 22 February 2018 (UTC)[reply]
@Okkn: Oh, it seems that you've already imported data. What a fast. Then, OK, I understand. It's for SPAPQL query's sake. Although I don't know well about SPAPQL query but I'm simply curious. Now what can we search in Wikidata? For example if I think "I want to cut DNA at here!", can I find enzyme for that? If you don't mind, would you make example query? Actual details of enzyme search would be, I suppose, very complex. What I want to see is just an example. I'm just curious :) --Was a bee (talk) 11:48, 23 February 2018 (UTC)[reply]
@Okkn: Although SPARQL is not so powerful, you can find restriction enzymes which can recognize the sequence "ACTTGTCATGGCGACTGTCCAGCTTTGTGCCAGGAGCCTCGCAGGGGTTG", for instance, by using regex FILTER.
SELECT DISTINCT ?enzyme ?enzymeLabel ?cut
WHERE
{
  ?enzyme wdt:P31 wd:Q49695242;
          rdfs:label ?enzymeLabel;
          wdt:P4863 ?seq;
          wdt:P4864 ?cut;
  FILTER (lang(?enzymeLabel) = "en") .
  FILTER regex ("ACTTGTCATGGCGACTGTCCAGCTTTGTGCCAGGAGCCTCGCAGGGGTTG", ?seq) 
}
Try it!
--Okkn (talk) 13:33, 23 February 2018 (UTC)[reply]
Thank you, that's very interesting. I copied that query at property talk page for other users. --Was a bee (talk) 03:32, 24 February 2018 (UTC)[reply]
@Okkn: thanks for creating the items for the examples. Feel free to add them to the |example= section of the {{Property proposal}} template and indicate |status=ready when you are done − Pintoch (talk) 19:24, 23 February 2018 (UTC)[reply]

I've added "produces cohesive end" (ja: 粘着末端) in order to check if the value of "isocaudomer" is valid or not; if "X isocaudomer Y", the "cohesive end" of X and Y are same. "Isoschizomer" and "neoschizomer" can also be checked by using recognition sequence (P4863), cutting site of restriction enzyme (P4864). @Was a bee: What do you think of this additional proposal? --Okkn (talk) 14:16, 24 February 2018 (UTC)[reply]

@Okkn: I think it's good from robustness as you mentioned, and data structure coherency as a whole. But I feel that current style, where representing/implicating "blunt end" by "no data", has one problem in practical data use. For ordinary people, it is difficult to judge whether "No one added the data yet." or "This enzyme generate blunt end." In that sense, I think perhaps explicit style can be better. For example, ([ACGTRMWSYKHBDVN]+\^|\^[ACGTRMWSYKHBDVN]+|\^) or ([ACGTRMWSYKHBDVN]+\^|\^[ACGTRMWSYKHBDVN]+|blunt\send) or something like that. --Was a bee (talk) 08:54, 25 February 2018 (UTC)[reply]
@Was a bee: I'd like to adopt your "\^" style! You're genius! I couldn't think of an idea for representing blunt ends. Thank you. --Okkn (talk) 19:09, 25 February 2018 (UTC)[reply]
@Okkn: It seems good because WikIData is multilingual project and that is language independent notation. By the way, last point from me, is it good to use label "produces cohesive end" even now? Because data includes both cohesive end and blunt end. Although I' don't know well about terminology, if there is more suitable expression, I think it is better to rewording that. --Was a bee (talk) 10:23, 2 March 2018 (UTC)[reply]
@Was a bee: You are right. A blunt end is neither "cohesive end", "sticky end" nor "protruding end". However, there is no proper term referring to both cohesive and blunt end, as far as I know. The opening words of w:Sticky and blunt ends is "DNA ends", but it seems difficult to understand why this property takes the values of DNA sequence. We pay attention to the sequence of cohesive end, and the value "^"simply means "Null" or "None", so I think "cohesive end" is acceptable for the name of this property. --Okkn (talk) 02:42, 4 March 2018 (UTC)[reply]
Thank you for explanation. I understood related circumstances. My all questions were resolved. So I explicitly put icon here   Support. --Was a bee (talk) 10:16, 4 March 2018 (UTC)[reply]
@Was a bee: Thank you so much ! I couldn't have done it without you! --Okkn (talk) 15:15, 4 March 2018 (UTC)[reply]

@Was a bee, Sebotic, ArthurPSmith, Okkn, Pintoch, ChristianKl:   Done: produces cohesive end (P4914). − Pintoch (talk) 23:44, 4 March 2018 (UTC) @Was a bee, ChristianKl, ArthurPSmith, Okkn, Pintoch, Sebotic:   Done: isocaudomer (P4915). − Pintoch (talk) 23:47, 4 March 2018 (UTC)[reply]

Oops, sorry about these two pings - my script isn't quite ready for pages with multiple proposal templates... − Pintoch (talk) 23:49, 4 March 2018 (UTC)[reply]

@Pintoch: Never mind. Thank you

for creating so many properties! --Okkn (talk) 03:36, 5 March 2018 (UTC)[reply]