Time Nick Message 11:17 tumer http://library.neu.edu.tr/kohanamespace/koha2index.xsl 11:16 thd tumer: I want to understand this perfectly 11:15 thd tumer: I would like to see everything related to how your are indexing now 11:15 tumer one sec 11:14 thd tumer: would you zip or tar your xslt files so that I can see them? 11:14 tumer if there is nothing else i have to go for dinner now 11:13 thd tumer: yes, Sebastian corrected him. But I misunderstood what out of the box meant. 11:12 tumer not out of the box, pay ID and they will write it 11:12 tumer but he suggested xelem which does not exist 11:12 tumer xpath enabled makes bigger indexes and is slow 11:11 thd tumer: Marc wrote that you could speed things up with xpath enabled by indexing xpaths 11:11 tumer no it is sloe in indexing not in retrieval 11:10 thd tumer: I had assumed the slowness for xpath enabled was a function of allowing xpath in queries 11:09 tumer 100K metarecords less than 10 min 11:09 tumer yes 11:08 thd tumer: ahh, so your xslt method is much faster 11:08 tumer i do not use xpath enabled indexing it is slow and so it says ID documentation 11:06 tumer yes but elem100$a does not distingush differnt datafields this does 11:06 dewey i already had it that way, thd. 11:06 thd tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster? 11:05 thd tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster? 11:05 tumer cause they have differnt xpaths 11:04 tumer 001 bibliographic for biblionumber 001 holdings for itemnumber etc 11:04 tumer so its a hybrid xpath indexing. xpath only allows me to index same datafields with differnt indexes 11:04 thd tumer: I do not quite understand what you mean by choosing which paths to index except as opposed to indexing every arbitrary and unneeded path 11:02 tumer and only those 11:02 tumer i choose which paths to index 11:02 tumer similar structure 11:02 tumer i choose what to index 11:02 thd tumer: so you still have elem 100$a sometimes? 11:02 tumer similar to record.abs 11:01 tumer i do not index everything 11:01 tumer no i do not 11:01 tumer i have the shortest path to bibliographic record keeping in sync with marc21 11:01 thd tumer: do you not XPATH everything now? 11:00 thd tumer: do you not XPATH everything now? 11:00 thd tumer: we might be able to design meta-records with shorter XPATHs 11:00 tumer but having said that if you xpath everything than i think it will be slow and cumbersome 10:59 thd tumer: yet that is important 10:59 tumer at indexing it makes it faster not for searching 10:58 thd tumer: my question is really about whether shorter XPATHs make a difference in performance 10:58 tumer i index xpath with xslt stylesheets 10:58 thd tumer: we should have some answer from Index Data about how to get maximum performance from XPATH 10:58 tumer i already answered your xpath question i thought 10:57 thd tumer: Joshua was willing to rephrase my XPATH indexing question 10:56 tumer i even slowed my realtime updating to within 2 minutes, safer on zebra db 10:55 thd tumer: well then we are thinking along similar paths 10:55 tumer correct 10:55 thd tumer: those could be updated by a batch process while the smaller records were updated in real time 10:54 tumer same 10:54 tumer i was thinking along the smae lines 10:54 thd tumer: what would be wrong with having a supplementary database of records which was slower too index 10:53 thd :) 10:53 tumer hope so 10:53 thd tumer: yet, It is intended for release in due course 10:52 tumer yes 10:52 tumer not released yet 10:52 thd tumer: so I understand that this only works in CVS now 10:52 tumer yes but with cvs zebra 10:51 thd tumer: Does your meta-record work? 10:51 tumer no meta-record ýndexing was possible 10:51 thd tumer: why, what is the problem? 10:50 tumer you could not do this with existing version of zebra anyway. only forthcoming zebra 10:50 thd tumer: maybe there could be a supplementary database with larger meta-records which was slower to index 10:49 thd tumer: I did not ask my question of Index Data correctly the first time 10:49 tumer they have answered before saying its slow on indexing 10:48 thd tumer: they have not responded 10:48 thd tumer: that is also my concern which is why I asked Index Data about the efficiency of XPATH indexing 10:47 tumer thats the only concern i have 10:46 thd tumer: do you think that would be a performance problem? 10:46 tumer having a too big record to index is my concern 10:46 thd tumer: paul has had the references and tracings working for building queries slowly for a couple of years 10:46 tumer why dropped out? 10:45 thd tumer: what is your hesitation over feasible? 10:45 thd tumer: paul has had the references and tracings working for building queries slowly for a couple of years 10:44 tumer i did not say difficult, its feasable 10:44 thd tumer: what is difficult about adding copies of authorities to meta-records 10:43 thd tumer: Koha is already an advanced system 10:43 tumer well it is an advanced system 10:42 thd ? 10:42 thd tumer: do you not see that as a significant advantage for every user as long as the user has the option of turning the behaviour on or off for the query 10:41 thd tumer: subject searches with a small number of query terms in a large collection tend to give much larger result sets and the user needs help from the system. 10:39 thd tumer: if the best record for addressing some problem that I am trying to solve is in a larger set with 10 records then I need the 10 record result set and 3 records was insufficient 10:38 thd tumer: if my very precise but uninformed (ignorant of the actual database content for authorised forms) query returns 3 bib records I may or may not be satisfied. 10:36 thd tumer: we should not deprive users of better results 10:36 thd tumer: library systems allow better results than Google 10:35 thd tumer: the Google mentality is that any results are good enough 10:34 thd tumer: yet, most users fail to do successful subject searches because they seldom choose the correct authorised terms 10:33 dewey i already had it that way, thd. 10:33 thd tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely 10:33 thd tumer: they can always have that option 10:33 tumer my experience is we are giving too many answers to the user they prefer lesser precise answers 10:32 thd tumer: some FR*R relationships are not explicitly defined and would require something extra 10:31 thd tumer: this is merely an explicitly defined relationship contained in library systems records 10:30 thd tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely 10:29 thd tumer: this is not quite FRSAR it is basic linking that any system should be able to do yet only Sirsi Unicorn does to my knowledge 10:27 thd tumer: the system can still give people good results faster and they can still use authorities to refine their query afterwords 10:26 tumer i see where you are heading thd 10:26 thd tumer: users growing up with Google are unlikely to have patience to be slow and careful most of the time 10:25 thd tumer: that is the slow careful good method 10:24 tumer the proxy or irc is blocking me today 10:24 thd tumer: I can do that now for more than one authority by building the query one authority at a time 10:24 thd tumer: I can do that now for more than one authority by building the query one authority at a time 10:23 thd tumer: yes 10:23 tumer so you want to find maize by searching corn 10:23 thd tumer: maize is no longer the authorised heading under LCSH 10:22 thd tumer: so for example maize is a food plant native to North America 10:22 tumer i see 10:22 tumer or whatever 10:22 thd tumer: with authorities you can find authorised headings by searching the 4XX 5XX in authority records for non-authorised forms 10:22 tumer s/enligten/ 10:21 tumer what else are we looking for, enlight me 10:21 thd tumer: yes but the 650 100 700 only contains the authorised heading 10:20 tumer or 100 or 700 for that matter 10:20 thd tumer: this would allow finding records with the conjunction of two subject headings without knowing the precise headings 10:20 tumer but bibliographic record alraedy has 650 with authorities filled in 10:19 thd tumer: the only way that would work for indexing is if the meta-record contained authorities 10:18 thd tumer: so instead of building the query slowly for more than one authorised heading the user types in whatever terms come to mind 10:18 thd tumer: however, there could be an option to search the authorities references and tracings directly from the search form collectively 10:17 dewey okay, thd. 10:17 thd tumer: it is also often necessary because the user may never successfully guess the authorised heading successfully unless the user is a librarian with years of experience or otherwise especially familiar with the authorised headings needed 10:15 thd tumer: that is a good careful but not extra quick way of performing searches using authorities 10:14 thd tumer: currently to fill authorised headings in the search form a separate search must be done for each authorised heading 10:13 thd tumer: currently to fill authorised headings a separate search must be done for each authorised heading 10:12 thd tumer: in authorities you can search by references and tracings to search for the authorised form using non-authorised forms 10:11 thd tumer: maybe I am using an out of date set of documentation 10:11 thd tumer: let me give a good example for the trivial one I gave last night 10:11 tumer get the new one from their cvs 10:10 thd tumer: ok, I have not read it thoroughly enough 10:10 tumer yes thats what i said 10:10 thd tumer: yes, is that in the ID Zebra documentation? 10:09 tumer ID zebra documented 10:09 thd tumer: is that documented? 10:09 tumer yes 10:09 thd tumer: you use xsl files for indexing? 10:08 tumer there is no more abs file, a whole bunch of xsl files 10:08 thd tumer: would you commit your abs file to me? 10:07 tumer thd:yes on IRC meeting thats whats agrred 10:07 thd tumer: what does it harm if you commit early. Oh, does that harm synching? 10:06 tumer synching and so on 10:06 thd tumer: I am ready :) 10:06 tumer that whats agreed 10:06 thd tumer: why wait for them? 10:06 tumer i will commit when toins and paul are ready 10:05 thd tumer: I understand but I want to see exactly so that I can understand perfectly what you are doing 10:05 tumer it will not change the way you query though 10:04 tumer all 3.0 stuf 10:04 thd tumer: when do you plan to commit your abs file? 10:04 tumer even if they do not exist now 10:04 tumer i index every field that i will require 10:04 tumer thd: no i still use the old way of searching 10:03 thd tumer: I just posted a hypothetical example query from Sebastian in February 10:03 thd tumer: you are on now 10:02 thd tumer: but you have no queries like find @attr 1=/*/address@hidden'245']/address@hidden'a'] someterm 10:02 tumer whtas wrong with irc am i on or not? 10:02 tumer so probably we are missing something 10:01 tumer i had problems using XML::Libxml same problem we had with MARC::File::Usmarc 10:01 thd tumer: so your queries are no different than without XPATH? 10:00 paul hdl may be interested by this tumer, as he should play with encoding problems next weeks 10:00 tumer thd: i use xsl stylesheets with xpath indexing 10:00 paul thd: nope 09:59 tumer paul: i think we have not solved the utf8 problem either. We just managed to get over it 09:59 paul (at least I really hope, because it needed almost 1 hour to write !) 09:59 thd tumer: before you leave, do you use xpath in your queries? 09:59 paul thd : look at my mail on koha-devel, i hope it's self explanatory 09:59 thd paul: Are suggesting changing both CVS and install then to something matching, not keeping either the same as now? 09:58 tumer k 09:58 paul but maybe we should continue to speak of this on mailing list, to let other express their opinion 09:57 paul thd : yes indeed. 09:57 tumer yes thats what i am saying 09:57 tumer so even untaring it should be enough if you are not running upgrade 09:57 thd paul: Do you mean that your original suggestion was to have CVS organised the same as the install? 09:56 tumer so why not have it as it should be once installed as well from the beginning 09:56 tumer well 2_2 installer does that on windows 09:56 paul yes, in my structure they are. 09:55 tumer so separate htdocs for both opac and intranet 09:55 paul I think 4 is OK 09:55 paul so, 2x2 or 4 ? 09:55 paul tumer: i'm not sure we need this additional level, as it will just contain 2 sub dirs 09:55 tumer opac -> cgi-bin& htdocx 09:55 tumer intanet->cgi-býn &htdocs 09:55 paul thd : yep. 09:54 paul (I just separate htdocs & non htdocs for templates) 09:54 thd tumer: you mean that you want CVS to be organised the same as the install? 09:54 tumer my installer makes koha ->intranet &opac 09:54 paul tumer : I agree with you, as what I suggest is almost the install structure in fact ;-) 09:53 tumer i always thought you keep cvs this way to make it DIFFICULT to install 09:53 tumer paul: why dont we have the structure as it installs 09:53 thd hello tumer 09:52 dewey hi tumer is still strugling 09:52 toins hi tumer 09:52 tumer hi paul and toins and thd 09:52 thd tumer: do you use xpath in your queries 09:52 paul (as well as toins) 09:52 paul hi tumer, i'm here 09:52 tumer yes looking for paul 09:52 thd tumer: are you really here 09:50 thd slef: are users the problem? 09:50 paul thd : lol 09:49 thd slef: you fix users instead of problems? 09:21 thd kados: yes I just checked the koha-cvs logs 09:21 kados thd: he hasn't committed his stuff yet 09:20 kados phone running off the hook today ... 09:20 kados I will try to respond today 09:20 kados yes, I very much agree 09:20 paul ah, ok 09:20 kados that was youre response to MJ :-) 09:19 kados wait ... 09:19 kados paul: i agree with much of what he says there 09:19 paul not yet arrived. 09:19 kados paul: did you see mj's response? 09:05 thd kados: are you still there? 09:03 thd kados: I only see devel-week related files for zebra in CVS 09:02 thd kados: I am suspecting that tumer has not yet committed his recent improvements 09:01 thd paul: I do not find any abs files using xpath in CVS. Am I looking in the wrong place? 09:00 thd kados: his indexing fields differently with xpath 08:59 thd good morning paul 08:59 thd kados: tumer has implemented his schema for bibliographic and holdings records in one record 08:59 paul (& good morning to kados & thd) 08:59 paul &kados, take time to read my mail on koha-devel & what you think of my suggestion. 08:58 kados no 08:58 thd kados: did you see the conversation I had with tumer last night in the logs? 08:57 kados thd: of course :-) 08:54 thd kados: are you there? 02:37 toins hello 02:37 osmoze hello all 02:37 hdl hi 01:11 ai any here ?? can give me some help plz 01:10 ai hi, 18:48 tumer will 18:48 thd tumer: good night 18:48 thd tumer: look at that pdf link when you are awake 18:47 tumer well nice talking to you thd, i'll check those later i have to go to sleep now. G'night! 18:46 thd tumer: try this though which includes FRAR and FRSAR http://www.kaapeli.fi/~fla/frbr05/delseyModeling%20subject%20access.pdf 18:45 tumer power fights 18:45 thd yes this one http://www.oclc.org/research/presentations/oneill/frbrddb2.ppt 18:45 tumer i have seen a powerpoint at oclc i think it was 18:44 thd tumer: I have a simple power point for you 18:44 thd tumer: do you understand well what FRBR does? 18:42 thd tumer: I used to cause stack overflows at a fairly distant libraries circulation system by borrowing too much at one time to avoid the 5 hour round trip commute 18:41 tumer my user only take what i give them 18:41 thd tumer: you need me breaking your system with a single query 18:40 thd tumer: your users are not demanding enough 18:40 tumer thd:i understand the question. It did not pose a problem to me yet. if you have such a need than you have to include them in your meta-record 18:39 thd together in the same meta-record 18:39 thd tumer: storing copies of related records together should solve that problem at the indexing level 18:38 thd tumer: If my result set is small it may be manageable but if my result set is large it is the same problem as with 10,000 biblio matches and knowing which are in a particular library 18:37 tumer well its endless i know 18:36 thd tumer: yet suppose I do a subject search and want to sort by the ones with the largest number of holdings or some other factor of most used 18:35 thd tumer: yes if you only want to retrieve matches from one authority at a time 18:34 tumer searching the authorities (separately) during retrieval is fast enough already 18:33 thd tumer: given the indexing limitations of Zebra including authorities in the meta-record seems to me the only way to do interesting things with authorities 18:31 thd tumer: including authorities in the meta-record should give you the same advantage as including holdings in the meta-record with bibliographic records 18:30 tumer i know but i have to look into some benchmarks 18:29 tumer try again i think now they have it 18:29 thd tumer: do you understand the indexing advantage of having related copies of authorities in the same meta-record as bibliographic records? 18:29 thd tumer: I downloaded it last about a year ago 18:29 tumer typical brits 18:28 thd tumer: I downloaded from a BL link and they left it out with a nice note saying that I could buy commercial support from the original developer if I wanted more default features 18:27 tumer well british library provides the one for unimarc both ways 18:26 thd tumer: I have usemarcon but I have no functioning configuration files for doing any conversion 18:25 thd tumer: yes many duplicates but very efficient retrieval by authority tracings and references 18:24 tumer yes it uses usemarcon utility from any marc to anymarc 18:24 thd ? 18:24 thd tumer: really, does it do the inverse as well 18:24 tumer and duplicate every authority for 1000 times? 18:23 tumer yazproxy converts from marc21 to unimarc on the fly 18:23 thd tumer: I think it will be very useful for you to have related authorities in the same meta-record as a bibliographic record 18:22 thd tumer: not merely something that can work 18:22 tumer thats what i do for holdings 18:22 thd tumer 18:22 thd tumer: I am trying to find the maximally efficient method for indexing 18:21 tumer than you can index any part of meta-record as you wish 18:21 thd and separate them by a different XPATH 18:21 tumer that is possible 18:21 thd tumer: I want to put them in the same meta-record 18:20 tumer same server but different db names will do 18:19 thd tumer: and if you look closely I want to be able to index UNIMARC and MARC 21 and anything else in the same DB. 18:18 thd tumer: the answer for efficiency was a focus of my koha zebra list question 18:17 tumer i am not quite sure yet 18:16 thd tumer: is retrieval efficiency a function of the XPATH length in any way? 18:15 tumer koharecord/holdings/record/datafield(856)/subfield(a) 18:14 thd tumer: What do you use? 18:14 tumer although i do not use that any more. Thats dev-week indexing 18:14 tumer yes 18:14 thd tumer: and that works for MARCXML in addition to MARC? 18:13 tumer or just melm 245 Title 18:13 tumer syntax is melm 700$a Author melm 700$d Date etc. 18:12 thd tumer: what method do we use to distinguish the subfields from different fields? 18:11 tumer i index 952 at 25 different subfield level 18:11 tumer no we index at either level field or subfield by subfield 18:10 thd tumer: I thought we were only indexing fields mostly not subfields specifically 18:10 tumer no they are melm 700$a and melm 200$a 18:10 thd tumer: in MARCXML at the element level all $a look alike 18:10 tumer well we already do dont we? 18:09 thd tumer: the problem paul was trying to answer was how to index $a in 200 differently from $a in 700 18:07 tumer yes i read it but did not understand MARC's suggestion 18:07 thd tumer: did you see that part of my koha-zebra list message? 18:07 tumer i am not quite sure what he said 18:06 thd tumer: but about what Marc told paul 18:06 tumer yes and you may work and improve on this. not in my priorities list 18:05 thd tumer: if Koha can provide a design for such a major preoccupation of library science for the past 10 years then it could be very important for Koha to. 18:03 tumer and not much important for me at this stage for koha 18:03 thd tumer: yes but very primitive for works and expressions only and not efficient 18:02 tumer from marcxml to frbr 18:02 tumer i know but there is a conversion route defined by loc 18:01 thd at least not yet 18:01 thd tumer: FRBR is not a record type 18:01 tumer on the fly conversion to any type 18:00 tumer using the new alvis filter of zebra and an xmlstyle sheet i can index anything and now i can get DC,MOD and FRBR from my meta-record 18:00 thd or FRAR, FRASR, etc. 17:59 thd tumer: yes but it does not have one for FRBR 17:59 tumer thd:marcxml already has type definition for bibliographic,holdings,authorities,community etc that you can set as <record> attribute 17:53 thd tumer: my purpose is this http://wiki.koha.org/doku.php?id=en:development:super_meta_record_db 17:53 thd tumer: http://lists.nongnu.org/archive/html/koha-zebra/2006-08/msg00001.html 17:51 tumer i dont remember this 17:50 thd tumer: Paul was satisfied with a slower method for what he had asked about indexing authorities at the time 17:49 thd tumer: In the message that I quoted, Marc tells Paul there is a fast method 17:49 tumer did i miss that? 17:49 thd tumer: In my koha-zebra list message I quoted an Index Data zebra list message from February 17:48 tumer about twice the size of marc-record 17:48 tumer lots of verbal coming through 17:47 thd tumer: why is getting XML necessarily slower? 17:47 tumer but i could not find any other way 17:47 tumer the retrieval is slower cause i have to get xml and not marc-record 17:47 thd tumer: slow retrieval was what I was trying to avoid 17:46 tumer slower on retrieval, very fast indexing overal acceptable and scalable 17:45 thd tumer: how well does that scale? 17:45 tumer the schema allows me to define them separately as xpaths 17:44 tumer my holdings records have their own 001 004 005 and 008 separately indexed 17:44 thd tumer: what method do you use? 17:44 thd tumer: I asked poorly the first time and now they are punishing me 17:44 tumer i already do that 17:43 tumer well you see you can index say 001 bibliographic separately from 001 holdings 17:43 thd tumer: Index Data has not answered 17:43 thd tumer: I asked a question about indexing on the koha-zebra list 17:43 tumer top-two meaning kohacollection and koharecord? 17:42 thd tumer: I just had not understood how the top two elements in your schema worked 17:42 tumer was it your question at yaz-list? 17:41 thd tumer: no I definitely approve 17:41 tumer you mean you do not approve? 17:41 thd tumer: that was what I had assumed 17:36 tumer you may now 17:36 tumer one sec 17:36 thd tumer: I cannot connect to that 17:36 tumer i already put it to production code at NEU 17:35 tumer schema is there together with koharaecod.xsd 17:35 tumer thd: look at http://library.neu.edu.tr/kohanamespace 17:34 thd tumer: hello. Would you explain the top two levels of your proposed XML schema to me? 17:33 tumer hi thd 17:31 thd tumer [A]: are you there? 15:09 owen Yeah, loads 15:08 Burgwork owen, having fun with css and IE?