ࡱ> 2F  !"#$%&'()*+,-./01E56789:;<=>?@ABCDGRdO)~83PowerPoint Document(s^SummaryInformation(4!DocumentSummaryInformation8B#( / 0LDArial(8(P I(Pn@N ` .  @n?" dd@  @@``  `I      0AA@gʚ;ʚ;g4ododkpppp@ <4ddddgʚ; <4!d!dʚ;(h___PPT2001D<4Xp___PPTMac11@f   hnamd` Arial&Monotype Typography    hnamd` Arial&Monotype Typography    hnamd` Arial&Monotype Typography    hnamd` Arial&Monotype Typography    hnamd` Arial&Monotype Typography    hnamd` Arial&Monotype Typography ?  %Searching ProvenanceIShankar Pasupathy, Network Appliance PASS Workshop, Harvard October 2005 J%%OutlineDWhy does Netapp care ? Questions we d like to ask Use Google to search provenance ? Novel uses of searching provenance Distributed search of provenance Metrics Why does Netapp care ?Workflows (ILM) Morgan Stanley wants to track how financial reports are generated Set backup policies on workflows Generic search Relevance of a document when ranking results What sources were used to create the document ? How authoritative are the sources ? Audit trails Don t backup stuff that can be easily recreated E.g. object files (.o) Thumbnails of imagesPdPP-PTP=P,Pd-  T=, 4Questions we d like to askForward and reverse queries Given file  foo , tell me the workflow that it s part of What are all the data sets I can recreate easily from  foo (need the notion of how long it takes to create descendants of foo) How much space will I save ? Given file  foo , tell me exactly how to recreate it Fine-grain query Which parts of other files (offset, length) were used to create  foo 5G5  Gb(aA+x Use Google ?Use Google to index all provenance information out there No security (we could fix that) Is it too heavy weight ? Are queries well structured ? Perhaps SQL and a relational database work wellN991991  Query mechanisms6What the user sees How would you modify NFS/CIFS to support provenance queries ? Could you do something interesting via the filesystem E.g. lookup an extended attribute whose name is well known to get the provenance tree Modify fstat ? Visual representation of provenance Notification of provenance changes ZtZVZZIZZtV IN Query language& TUser access rights Is uid/gid enough ? Probably want to be specify roles and domains that the questioner has control over. What if you don t have permissions to view a region of the provenance tree ? What about the provenance system itself Is SQL good enough ? RDF ontology/SPARQL query language VZZ(Z9Z(9 Other uses of provenanceDetermine trust, authority of authors of documents A provenance tree is similar to a paper citation tree Who s cited most often ? Use that to improve search results Craig Soules work Capture relationships among files over a period of timeV3ZsZZ8Z3s8 ?"Distributed querying of provenance##(Data may have been produced from sources on different computers Distributed querying => common format for recording provenance What is that format ? How do you describe partial or incomplete answers ? Because parts of the distributed provenance tree are not available nZZ5ZCZZ5C  MetricsHHow do you compare PASS query systems ? Performance Relevance Benchmarks&(!(!/     g` 33` Sf3f` 33g` f` www3PP` ZXdbmo` \ғ3y`Ӣ` 3f3ff` 3f3FKf` hk]wwwfܹ` ff>>\`Y{ff` R>&- {p_/̴>?" dd@,|?" dd@   " @ ` n?" dd@   @@``PR    @ ` ` p>> f(    6``  `}  T Click to edit Master title style! !  0c  `  RClick to edit Master text styles Second level Third level Fourth level Fifth level!     S  0j ^ `  >*  0po ^   @*  0t ^ `  @*H  0޽h ? 3380___PPT10.) e Default Design zr@  (     0p  P    P*    0      R*  d  c $ ?    0   0  RClick to edit Master text styles Second level Third level Fourth level Fifth level!     S  6  _P   P*    6  _   R*  H  0޽h ? 3380___PPT10.)$g  $(  r  S  Z>  r  S [ `    H  0޽h ? 3380___PPT10.)`r$ g P$(  r  S   `}   r  S p?  `  H  0޽h ? 3380___PPT10.)$ g p$(  r  S   `}   r  S   `  H  0޽h ? 3380___PPT10.)@<$ g ,$(  ,r , S   `}   r , S @  `  H , 0޽h ? 3380___PPT10.,)$ g L$(  Lr L S p  `}   r L S   `  H L 0޽h ? 3380___PPT10.`y$ g $$(  $r $ S t `}   r $ S  `  H $ 0޽h ? 3380___PPT10.*6$ g 8$(  8r 8 S  `}   r 8 S p `  H 8 0޽h ? 3380___PPT10.3 $ g D$(  Dr D S '  `}   r D S )  `  H D 0޽h ? 3380___PPT10.gj\$ g 04$(  4r 4 S   `}   r 4 S pP  `  H 4 0޽h ? 3380___PPT10.1A$  g PP$(  Pr P S G  `}   r P S H  `  H P 0޽h ? 3380___PPT10.`z5 "4 0(  X  C       S  k   0    H  0޽h ? 3380___PPT10.)@P "4 `(  X  C       S    0    H  0޽h ? 3380___PPT10.*2r "4  (   X  C       S m   0    H  0޽h ? 3380___PPT10.*2r "4 ((  (X ( C      ( S   0    H ( 0޽h ? 3380___PPT10.+ "4 0(  0X 0 C      0 S %  0    H 0 0޽h ? 3380___PPT10., "4 <(  <X < C      < S В   0    H < 0޽h ? 3380___PPT10.3+O "4 @@(  @X @ C      @ S Ъ   0    H @ 0޽h ? 3380___PPT10.3+O "4  H(  HX H C      H S N   0    H H 0޽h ? 3380___PPT10.h "4 T(  TX T C      T S i  0    H T 0޽h ? 3380___PPT10.^  "4 `X(  XX X C      X S p   0    H X 0޽h ? 3380___PPT10.^cxp^RЀ3ÿ lHbP  @AL G@;b `B&V@Rc?EBhC9#L=?6>R/"S>@Rc?EBhC9#L=?6>R/"S>@Rc?EBhC9#L=?6>R"I\e,I\e,I\e, ״״״H4yyvùqy0yly4yyvùqy0yly4yyvùqy0ylyK4`ja™\Ey^{v^7^ja}1pbߏ}ypl>ktrϘ~i7y{oa|s4`ja™\Ey^{v^7^ja}1pbߏ}ypl>ktrϘ~i7y{oa|s4`ja™\Ey^{v^7^ja}1pbߏ}ypl>ktrϘ~i7y{oa|sH3xwyÆRvzmXKvKuuqKwy劄1tieȊy7|\p۰uqy~{uyݷ3xwyÆRvzmXKvKuuqKwy劄1tieȊy7|\p۰uqy~{uyݷ3xwyÆRvzmXKvKuuqKwy劄1tieȊy7|\p۰uqy~{uyKir}yyWzyyp~Dv~Dnwb3~DÂyyVtwĂaʃf{yqkni3yevyyσt̂pir}yyWzyyp~Dv~Dnwb3~DÂyyVtwĂaʃf{yqkni3yevyyσt̂pir}yyWzyyp~Dv~Dnwb3~DÂyyVtwĂaʃf{yqkni3yevyyσt̂p߁zwzzwzzwz; UjWi\tY\rf҃o_݅܆ԂƜUjWi\tY\rf҃o_݅܆ԂƜUjWi\tY\rf҃o_݅܆ԂƜ3oMi\zm^bjŀZ|k{DnbjűS`ԯbKiNvٕui\F\XK3oMi\zm^bjŀZ|k{DnbjűS`ԯbKiNvٕui\F\XK3oMi\zm^bjŀZ|k{DnbjűS`ԯbKiNvٕui\F\XK3H\{pB`QfƧyet7tYu]eWgnwQəĹ5P\3H\{pB`QfƧyet7tYu]eWgnwQəĹ5P\3H\{pB`QfƧyet7tYu]eWgnwQəĹ5P\.R^ǐUSՐVtfSצA܆bVXԂubkjLLb\.R^ǐUSՐVtfSצA܆bVXԂubkjLLb\.R^ǐUSՐVtfSצA܆bVXԂubkjLLb\PȈ\RqWOr\1'hkt݅sxc܆bhkFkԂhbT3bfhPȈ\RqWOr\1'hkt݅sxc܆bhkFkԂhbT3bfhPȈ\RqWOr\1'hkt݅sxc܆bhkFkԂhbT3bfh| nۆ{dbxv~prTjۆ{dbxv~prTjۆ{dbxv~prTj)YWXc/in6ojqEYit_I)YWXc/in6ojqEYit_I)YWXc/in6ojqEYit_IMLciJoLUu uaXMLciJoLUu uaXMLciJoLUu uaXcScc[zPˇelʑd[cScc[zPˇelʑd[cScc[zPˇelʑd[)OiIiq||W8~|f؇hhv}nqa)OiIiq||W8~|f؇hhv}nqa)OiIiq||W8~|f؇hhv}nqak ry mechanismsQuery languageOther uses of provenance#Distributed querying of provenanceMetrics  Fonts UsedDesign Template Slide Titles 8_AdHocReviewCycleID_EmailSubject _AuthorEmail_AuthorEmailDisplayName'ɈCopy of my slidesyNShankar.Pasupathy@netapp.comegPasupathy, ShankarNCurrent UserG%_O^ "$Margo SeltzerMargo Seltzer