IRC log for #koha, 2006-08-12

All times shown according to UTC.

Time S Nick Message
15:08 Burgwork owen, having fun with css and IE?
15:09 owen Yeah, loads
17:31 thd tumer [A]: are you there?
17:33 tumer hi thd
17:34 thd tumer: hello.  Would you explain the top two levels of your proposed XML schema to me?
17:35 tumer thd: look at http://library.neu.edu.tr/kohanamespace
17:35 tumer schema is there together with koharaecod.xsd
17:36 tumer i already put it to production code at NEU
17:36 thd tumer: I cannot connect to that
17:36 tumer one sec
17:36 tumer you may now
17:41 thd tumer: that was what I had assumed
17:41 tumer you mean you do not approve?
17:41 thd tumer: no I definitely approve
17:42 tumer was it your question at yaz-list?
17:42 thd tumer: I just had not understood how the top two elements in your schema worked
17:43 tumer top-two meaning kohacollection and koharecord?
17:43 thd tumer: I asked a question about indexing on the koha-zebra list
17:43 thd tumer: Index Data has not answered
17:43 tumer well you see you can index say 001 bibliographic separately from 001 holdings
17:44 tumer i already do that
17:44 thd tumer: I asked poorly the first time and now they are punishing me
17:44 thd tumer: what method do you use?
17:44 tumer my holdings records have their own 001 004 005 and 008 separately indexed
17:45 tumer the schema allows me to define them separately as xpaths
17:45 thd tumer: how well does that scale?
17:46 tumer slower on retrieval, very fast indexing overal acceptable and scalable
17:47 thd tumer: slow retrieval was what I was trying to avoid
17:47 tumer the retrieval is slower cause i have to get xml and not marc-record
17:47 tumer but i could not find any other way
17:47 thd tumer: why is getting XML necessarily slower?
17:48 tumer lots of verbal coming through
17:48 tumer about twice the size of marc-record
17:49 thd tumer: In my koha-zebra list message I quoted an Index Data zebra list message from February
17:49 tumer did i miss that?
17:49 thd tumer: In the message that I quoted, Marc tells Paul there is a fast method
17:50 thd tumer: Paul was satisfied with a slower method for what he had asked about indexing authorities at the time
17:51 tumer i dont remember this
17:53 thd tumer: http://lists.nongnu.org/archiv[…]-08/msg00001.html
17:53 thd tumer: my purpose is this http://wiki.koha.org/doku.php?[…]er_meta_record_db
17:59 tumer thd:marcxml already has type definition for bibliographic,holdings,authorities,community etc that you can set as <record> attribute
17:59 thd tumer: yes but it does not have one for FRBR
18:00 thd or FRAR, FRASR, etc.
18:00 tumer using the new alvis filter of zebra and an xmlstyle sheet i can index anything and now i can get DC,MOD and FRBR from my meta-record
18:01 tumer on the fly conversion to any type
18:01 thd tumer: FRBR is not a record type
18:01 thd at least not yet
18:02 tumer i know but there is a conversion route defined by loc
18:02 tumer from marcxml to frbr
18:03 thd tumer: yes but very primitive for works and expressions only and not efficient
18:03 tumer and not much important for me at this stage for koha
18:05 thd tumer: if Koha can provide a design for such a major preoccupation of library science for the past 10 years then it could be very important for Koha to.
18:06 tumer yes and you may work and improve on this. not in my priorities list
18:06 thd tumer: but about what Marc told paul
18:07 tumer i am not quite sure what he said
18:07 thd tumer: did you see that part of my koha-zebra list message?
18:07 tumer yes i read it but did not understand MARC's suggestion
18:09 thd tumer: the problem paul was trying to answer was how to index $a in 200 differently from $a in 700
18:10 tumer well we already do dont we?
18:10 thd tumer: in MARCXML at the element level all $a look alike
18:10 tumer no they are melm 700$a and melm 200$a
18:10 thd tumer: I thought we were only indexing fields mostly not subfields specifically
18:11 tumer no we index at either level field or subfield by subfield
18:11 tumer i index 952 at 25 different subfield level
18:12 thd tumer: what method do we use to distinguish the subfields from different fields?
18:13 tumer syntax is melm 700$a  Author melm 700$d Date etc.
18:13 tumer or just melm 245 Title
18:14 thd tumer: and that works for MARCXML in addition to MARC?
18:14 tumer yes
18:14 tumer although i do not use that any more. Thats dev-week indexing
18:14 thd tumer: What do you use?
18:15 tumer koharecord/holdings/record/​datafield(856)/subfield(a)
18:16 thd tumer: is retrieval efficiency a function of the XPATH length in any way?
18:17 tumer i am not quite sure yet
18:18 thd tumer: the answer for efficiency was a focus of my koha zebra list question
18:19 thd tumer: and if you look closely I want to be able to index UNIMARC and MARC 21 and anything else in the same DB.
18:20 tumer same server but different db names will do
18:21 thd tumer: I want to put them in the same meta-record
18:21 tumer that is possible
18:21 thd and separate them by a different XPATH
18:21 tumer than you can index any part of meta-record as you wish
18:22 thd tumer: I am trying to find the maximally efficient method for indexing
18:22 thd tumer
18:22 tumer thats what i do for holdings
18:22 thd tumer: not merely something that can work
18:23 thd tumer: I think it will be very useful for you to have related authorities in the same meta-record as a bibliographic record
18:23 tumer yazproxy converts from marc21 to unimarc on the fly
18:24 tumer and duplicate every authority for 1000 times?
18:24 thd tumer: really, does it do the inverse as well
18:24 thd ?
18:24 tumer yes it uses usemarcon utility from any marc to anymarc
18:25 thd tumer: yes many duplicates but very efficient retrieval by authority tracings and references
18:26 thd tumer: I have usemarcon but I have no functioning configuration files for doing any conversion
18:27 tumer well british library provides the one for unimarc both ways
18:28 thd tumer: I downloaded from a BL link and they left it out with a nice note saying that I could buy commercial support from the original developer if I wanted more default features
18:29 tumer typical brits
18:29 thd tumer: I downloaded it last about a year ago
18:29 thd tumer: do you understand the indexing advantage of having related copies of authorities in the same meta-record as bibliographic records?
18:29 tumer try again i think now they have it
18:30 tumer i know but i have to look into some benchmarks
18:31 thd tumer: including authorities in the meta-record should give you the same advantage as including holdings in the meta-record with bibliographic records
18:33 thd tumer: given the indexing limitations of Zebra including authorities in the meta-record seems to me the only way to do interesting things with authorities
18:34 tumer searching the authorities (separately) during retrieval is fast enough already
18:35 thd tumer: yes if you only want to retrieve matches from one authority at a time
18:36 thd tumer: yet suppose I do a subject search and want to sort by the ones with the largest number of holdings or some other factor of most used
18:37 tumer well its endless i know
18:38 thd tumer: If my result set is small it may be manageable but if my result set is large it is the same problem as with 10,000 biblio matches and knowing which are in a particular library
18:39 thd tumer: storing copies of related records together should solve that problem at the indexing level
18:39 thd together in the same meta-record
18:40 tumer thd:i understand the question. It did not pose a problem to me yet. if you have such a need than you have to include them in your meta-record
18:40 thd tumer: your users are not demanding enough
18:41 thd tumer: you need me breaking your system with a single query
18:41 tumer my user only take what i give them
18:42 thd tumer: I used to cause stack overflows at a fairly distant libraries circulation system by borrowing too much at one time to avoid the 5 hour round trip commute
18:44 thd tumer: do you understand well what FRBR does?
18:44 thd tumer: I have a simple power point for you
18:45 tumer i have seen a powerpoint at oclc i think it was
18:45 thd yes this one http://www.oclc.org/research/p[…]eill/frbrddb2.ppt
18:45 tumer power fights
18:46 thd tumer: try this though which includes FRAR and FRSAR http://www.kaapeli.fi/~fla/frb[…]ject%20access.pdf
18:47 tumer well nice talking to you thd, i'll check those later i have to go to sleep now. G'night!
18:48 thd tumer: look at that pdf link when you are awake
18:48 thd tumer: good night
18:48 tumer will
01:10 ai hi,
01:11 ai any here ?? can give me some help plz
02:37 hdl hi
02:37 osmoze hello all
02:37 toins hello
08:54 thd kados: are you there?
08:57 kados thd: of course :-)
08:58 thd kados: did you see the conversation I had with tumer last night in the logs?
08:58 kados no
08:59 paul &kados, take time to read my mail on koha-devel & what you think of my suggestion.
08:59 paul (& good morning to kados & thd)
08:59 thd kados: tumer has implemented his schema for bibliographic and holdings records in one record
08:59 thd good morning paul
09:00 thd kados: his indexing fields differently with xpath
09:01 thd paul: I do not find any abs files using xpath in CVS.  Am I looking in the wrong place?
09:02 thd kados: I am suspecting that tumer has not yet committed his recent improvements
09:03 thd kados: I only see devel-week related files for zebra in CVS
09:05 thd kados: are you still there?
09:19 kados paul: did you see mj's response?
09:19 paul not yet arrived.
09:19 kados paul: i agree with much of what he says there
09:19 kados wait ...
09:20 kados that was youre response to MJ :-)
09:20 paul ah, ok
09:20 kados yes, I very much agree
09:20 kados I will try to respond today
09:20 kados phone running off the hook today ...
09:21 kados thd: he hasn't committed his stuff yet
09:21 thd kados: yes I just checked the koha-cvs logs
09:49 thd slef: you fix users instead of problems?
09:50 paul thd : lol
09:50 thd slef: are users the problem?
09:52 thd tumer: are you really here
09:52 tumer yes looking for paul
09:52 paul hi tumer, i'm here
09:52 paul (as well as toins)
09:52 thd tumer: do you use xpath in your queries
09:52 tumer hi paul and toins and thd
09:52 toins hi tumer
09:52 dewey hi tumer is still strugling
09:53 thd hello tumer
09:53 tumer paul: why dont we have the structure as it installs
09:53 tumer i always thought you keep cvs this way to make it DIFFICULT to install
09:54 paul tumer : I agree with you, as what I suggest is almost the install structure in fact ;-)
09:54 tumer my installer makes koha ->intranet &opac
09:54 thd tumer: you mean that you want CVS to be organised the same as the install?
09:54 paul (I just separate htdocs & non htdocs for templates)
09:55 paul thd : yep.
09:55 tumer intanet->cgi-býn &htdocs
09:55 tumer opac -> cgi-bin& htdocx
09:55 paul tumer: i'm not sure we need this additional level, as it will just contain 2 sub dirs
09:55 paul so, 2x2 or 4 ?
09:55 paul I think 4 is OK
09:55 tumer so separate htdocs for both opac and intranet
09:56 paul yes, in my structure they are.
09:56 tumer well 2_2 installer does that on windows
09:56 tumer so why not have it as it should be once installed as well from the beginning
09:57 thd paul: Do you mean that your original suggestion was to have CVS organised the same as the install?
09:57 tumer so even untaring it should be enough if you are not running upgrade
09:57 tumer yes thats what i am saying
09:57 paul thd : yes indeed.
09:58 paul but maybe we should continue to speak of this on mailing list, to let other express their opinion
09:58 tumer k
09:59 thd paul: Are suggesting changing both CVS and install then to something matching, not keeping either the same as now?
09:59 paul thd : look at my mail on koha-devel, i hope it's self explanatory
09:59 thd tumer: before you leave, do you use xpath in your queries?
09:59 paul (at least I really hope, because it needed almost 1 hour to write !)
09:59 tumer paul: i think we have not solved the utf8 problem either. We just managed to get over it
10:00 paul thd: nope
10:00 tumer thd: i use xsl stylesheets with xpath indexing
10:00 paul hdl may be interested by this tumer, as he should play with encoding problems next weeks
10:01 thd tumer: so your queries are no different than without XPATH?
10:01 tumer i had problems using XML::Libxml same problem we had with MARC::File::Usmarc
10:02 tumer so probably we are missing something
10:02 tumer whtas wrong with irc am i on or not?
10:02 thd tumer: but you have no queries like find @attr 1=/*/address@hidden'245']/address@hidden'a'] someterm
10:03 thd tumer: you are on now
10:03 thd tumer: I just posted a hypothetical example query from Sebastian in February
10:04 tumer thd: no i still use the old way of searching
10:04 tumer i index every field that i will require
10:04 tumer even if they do not exist now
10:04 thd tumer: when do you plan to commit your abs file?
10:04 tumer all 3.0 stuf
10:05 tumer it will not change the way you query though
10:05 thd tumer: I understand but I want to see exactly so that I can understand perfectly what you are doing
10:06 tumer i will commit when toins and paul are ready
10:06 thd tumer: why wait for them?
10:06 tumer that whats agreed
10:06 thd tumer: I am ready :)
10:06 tumer synching and so on
10:07 thd tumer: what does it harm if you commit early.  Oh, does that harm synching?
10:07 tumer thd:yes on IRC meeting thats whats agrred
10:08 thd tumer: would you commit your abs file to me?
10:08 tumer there is no more abs file, a whole bunch of xsl files
10:09 thd tumer: you use xsl files for indexing?
10:09 tumer yes
10:09 thd tumer: is that documented?
10:09 tumer ID zebra documented
10:10 thd tumer: yes, is that in the ID Zebra documentation?
10:10 tumer yes thats what i said
10:10 thd tumer: ok, I have not read it thoroughly enough
10:11 tumer get the new one from their cvs
10:11 thd tumer: let me give a good example for the trivial one I gave last night
10:11 thd tumer: maybe I am using an out of date set of documentation
10:12 thd tumer: in authorities you can search by references and tracings to search for the authorised form using non-authorised forms
10:13 thd tumer: currently to fill authorised headings a separate search must be done for each authorised heading
10:14 thd tumer: currently to fill authorised headings in the search form a separate search must be done for each authorised heading
10:15 thd tumer: that is a good careful but not extra quick way of performing searches using authorities
10:17 thd tumer: it is also often necessary because the user may never successfully guess the authorised heading successfully unless the user is a librarian with years of experience or otherwise especially familiar with the authorised headings needed
10:17 dewey okay, thd.
10:18 thd tumer: however, there could be an option to search the authorities references and tracings directly from the search form collectively
10:18 thd tumer: so instead of building the query slowly for more than one authorised heading the user types in whatever terms come to mind
10:19 thd tumer: the only way that would work for indexing is if the meta-record contained authorities
10:20 tumer but bibliographic record alraedy has 650 with authorities filled in
10:20 thd tumer: this would allow finding records with the conjunction of two subject headings without knowing the precise headings
10:20 tumer or 100 or 700 for that matter
10:21 thd tumer: yes but the 650 100 700 only contains the authorised heading
10:21 tumer what else are we looking for, enlight me
10:22 tumer s/enligten/
10:22 thd tumer: with authorities you can find authorised headings by searching the 4XX 5XX in authority records for non-authorised forms
10:22 tumer or whatever
10:22 tumer i see
10:22 thd tumer: so for example maize is a food plant native to North America
10:23 thd tumer: maize is no longer the authorised heading under LCSH
10:23 tumer so you want to find maize by searching corn
10:23 thd tumer: yes
10:24 thd tumer: I can do that now for more than one authority by building the query one authority at a time
10:24 thd tumer: I can do that now for more than one authority by building the query one authority at a time
10:24 tumer the proxy or irc is blocking me today
10:25 thd tumer: that is the slow careful good method
10:26 thd tumer: users growing up with Google are unlikely to have patience to be slow and careful most of the time
10:26 tumer i see where you are heading thd
10:27 thd tumer: the system can still give people good results faster and they can still use authorities to refine their query afterwords
10:29 thd tumer: this is not quite FRSAR it is basic linking that any system should be able to do yet only Sirsi Unicorn does to my knowledge
10:30 thd tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely
10:31 thd tumer: this is merely an explicitly defined relationship contained in library systems records
10:32 thd tumer: some FR*R relationships are not explicitly defined and would require something extra
10:33 tumer my experience is we are giving too many answers to the user they prefer lesser precise answers
10:33 thd tumer: they can always have that option
10:33 thd tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely
10:33 dewey i already had it that way, thd.
10:34 thd tumer: yet, most users fail to do successful subject searches because they seldom choose the correct authorised terms
10:35 thd tumer: the Google mentality is that any results are good enough
10:36 thd tumer: library systems allow better results than Google
10:36 thd tumer: we should not deprive users of better results
10:38 thd tumer: if my very precise but uninformed (ignorant of the actual database content for authorised forms) query returns 3 bib records I may or may not be satisfied.
10:39 thd tumer: if the best record for addressing some problem that I am trying to solve is in a larger set with 10 records then I need the 10 record result set and 3 records was insufficient
10:41 thd tumer: subject searches with a small number of query terms in a large collection tend to give much larger result sets and the user needs help from the system.
10:42 thd tumer: do you not see that as a significant advantage for every user as long as the user has the option of turning the behaviour on or off for the query
10:42 thd ?
10:43 tumer well it is an advanced system
10:43 thd tumer: Koha is already an advanced system
10:44 thd tumer: what is difficult about adding copies of authorities to meta-records
10:44 tumer i did not say difficult, its feasable
10:45 thd tumer: paul has had the references and tracings working for building queries slowly for a couple of years
10:45 thd tumer: what is your hesitation over feasible?
10:46 tumer why dropped out?
10:46 thd tumer: paul has had the references and tracings working for building queries slowly for a couple of years
10:46 tumer having a too big record to index is my concern
10:46 thd tumer: do you think that would be a performance problem?
10:47 tumer thats the only concern i have
10:48 thd tumer: that is also my concern which is why I asked Index Data about the efficiency of XPATH indexing
10:48 thd tumer: they have not responded
10:49 tumer they have answered before saying its slow on indexing
10:49 thd tumer: I did not ask my question of Index Data correctly the first time
10:50 thd tumer: maybe there could be a supplementary database with larger meta-records which was slower to index
10:50 tumer you could not do this with existing version of zebra anyway. only forthcoming zebra
10:51 thd tumer: why, what is the problem?
10:51 tumer no meta-record ýndexing was possible
10:51 thd tumer: Does your meta-record work?
10:52 tumer yes but with cvs zebra
10:52 thd tumer: so I understand that this only works in CVS now
10:52 tumer not released yet
10:52 tumer yes
10:53 thd tumer: yet, It is intended for release in due course
10:53 tumer hope so
10:53 thd :)
10:54 thd tumer: what would be wrong with having a supplementary database of records which was slower too index
10:54 tumer i was thinking along the smae lines
10:54 tumer same
10:55 thd tumer: those could be updated by a batch process while the smaller records were updated in real time
10:55 tumer correct
10:55 thd tumer: well then we are thinking along similar paths
10:56 tumer i even slowed my realtime updating to within 2 minutes, safer on zebra db
10:57 thd tumer: Joshua was willing to rephrase my XPATH indexing question
10:58 tumer i already answered your xpath question i thought
10:58 thd tumer: we should have some answer from Index Data about how to get maximum performance from XPATH
10:58 tumer i index xpath with xslt stylesheets
10:58 thd tumer: my question is really about whether shorter XPATHs make a difference in performance
10:59 tumer at indexing it makes it faster not for searching
10:59 thd tumer: yet that is important
11:00 tumer but having said that if you xpath everything than i think it will be slow and cumbersome
11:00 thd tumer: we might be able to design meta-records with shorter XPATHs
11:00 thd tumer: do you not XPATH everything now?
11:01 thd tumer: do you not XPATH everything now?
11:01 tumer i have the shortest path to bibliographic record keeping in sync with marc21
11:01 tumer no i do not
11:01 tumer i do not index everything
11:02 tumer similar to record.abs
11:02 thd tumer: so you still have elem 100$a sometimes?
11:02 tumer i choose what to index
11:02 tumer similar structure
11:02 tumer i choose which paths to index
11:02 tumer and only those
11:04 thd tumer: I do not quite understand what you mean by choosing which paths to index except as opposed to indexing every arbitrary and unneeded path
11:04 tumer so its a hybrid xpath indexing. xpath only allows me to index same datafields with differnt indexes
11:04 tumer 001 bibliographic for biblionumber 001 holdings for itemnumber etc
11:05 tumer cause they have differnt xpaths
11:05 thd tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster?
11:06 thd tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster?
11:06 dewey i already had it that way, thd.
11:06 tumer yes but elem100$a does not distingush differnt datafields this does
11:08 tumer i do not use xpath enabled indexing it is slow and so it says ID documentation
11:08 thd tumer: ahh, so your xslt method is much faster
11:09 tumer yes
11:09 tumer 100K metarecords less than 10 min
11:10 thd tumer: I had assumed the slowness for xpath enabled was a function of allowing xpath in queries
11:11 tumer no it is sloe in indexing not in retrieval
11:11 thd tumer: Marc wrote that you could speed things up with xpath enabled by indexing xpaths
11:12 tumer xpath enabled makes bigger indexes and is slow
11:12 tumer but he suggested xelem which does not exist
11:12 tumer not out of the box, pay ID and they will write it
11:13 thd tumer: yes, Sebastian corrected him.  But I misunderstood what out of the box meant.
11:14 tumer if there is nothing else i have to go for dinner now
11:14 thd tumer: would you zip or tar your xslt files so that I can see them?
11:15 tumer one sec
11:15 thd tumer: I would like to see everything related to how your are indexing now
11:16 thd tumer: I want to understand this perfectly
11:17 tumer http://library.neu.edu.tr/koha[…]ce/koha2index.xsl

| Channels | #koha index | Today | Search | Google Search | Plain-Text | plain, newest first | summary