IRC Logs

Time  Nick     Message
11:17 tumer    http://library.neu.edu.tr/kohanamespace/koha2index.xsl
11:16 thd      tumer: I want to understand this perfectly
11:15 thd      tumer: I would like to see everything related to how your are indexing now
11:15 tumer    one sec
11:14 thd      tumer: would you zip or tar your xslt files so that I can see them?
11:14 tumer    if there is nothing else i have to go for dinner now
11:13 thd      tumer: yes, Sebastian corrected him.  But I misunderstood what out of the box meant.
11:12 tumer    not out of the box, pay ID and they will write it
11:12 tumer    but he suggested xelem which does not exist
11:12 tumer    xpath enabled makes bigger indexes and is slow
11:11 thd      tumer: Marc wrote that you could speed things up with xpath enabled by indexing xpaths
11:11 tumer    no it is sloe in indexing not in retrieval
11:10 thd      tumer: I had assumed the slowness for xpath enabled was a function of allowing xpath in queries
11:09 tumer    100K metarecords less than 10 min
11:09 tumer    yes
11:08 thd      tumer: ahh, so your xslt method is much faster
11:08 tumer    i do not use xpath enabled indexing it is slow and so it says ID documentation
11:06 tumer    yes but elem100$a does not distingush differnt datafields this does
11:06 dewey    i already had it that way, thd.
11:06 thd      tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster?
11:05 thd      tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster?
11:05 tumer    cause they have differnt xpaths
11:04 tumer    001 bibliographic for biblionumber 001 holdings for itemnumber etc
11:04 tumer    so its a hybrid xpath indexing. xpath only allows me to index same datafields with differnt indexes
11:04 thd      tumer: I do not quite understand what you mean by choosing which paths to index except as opposed to indexing every arbitrary and unneeded path
11:02 tumer    and only those
11:02 tumer    i choose which paths to index
11:02 tumer    similar structure
11:02 tumer    i choose what to index
11:02 thd      tumer: so you still have elem 100$a sometimes?
11:02 tumer    similar to record.abs
11:01 tumer    i do not index everything
11:01 tumer    no i do not
11:01 tumer    i have the shortest path to bibliographic record keeping in sync with marc21
11:01 thd      tumer: do you not XPATH everything now?
11:00 thd      tumer: do you not XPATH everything now?
11:00 thd      tumer: we might be able to design meta-records with shorter XPATHs
11:00 tumer    but having said that if you xpath everything than i think it will be slow and cumbersome
10:59 thd      tumer: yet that is important
10:59 tumer    at indexing it makes it faster not for searching
10:58 thd      tumer: my question is really about whether shorter XPATHs make a difference in performance
10:58 tumer    i index xpath with xslt stylesheets
10:58 thd      tumer: we should have some answer from Index Data about how to get maximum performance from XPATH
10:58 tumer    i already answered your xpath question i thought
10:57 thd      tumer: Joshua was willing to rephrase my XPATH indexing question
10:56 tumer    i even slowed my realtime updating to within 2 minutes, safer on zebra db
10:55 thd      tumer: well then we are thinking along similar paths
10:55 tumer    correct
10:55 thd      tumer: those could be updated by a batch process while the smaller records were updated in real time
10:54 tumer    same
10:54 tumer    i was thinking along the smae lines
10:54 thd      tumer: what would be wrong with having a supplementary database of records which was slower too index
10:53 thd      :)
10:53 tumer    hope so
10:53 thd      tumer: yet, It is intended for release in due course
10:52 tumer    yes
10:52 tumer    not released yet
10:52 thd      tumer: so I understand that this only works in CVS now
10:52 tumer    yes but with cvs zebra
10:51 thd      tumer: Does your meta-record work?
10:51 tumer    no meta-record ýndexing was possible
10:51 thd      tumer: why, what is the problem?
10:50 tumer    you could not do this with existing version of zebra anyway. only forthcoming zebra
10:50 thd      tumer: maybe there could be a supplementary database with larger meta-records which was slower to index
10:49 thd      tumer: I did not ask my question of Index Data correctly the first time
10:49 tumer    they have answered before saying its slow on indexing
10:48 thd      tumer: they have not responded
10:48 thd      tumer: that is also my concern which is why I asked Index Data about the efficiency of XPATH indexing
10:47 tumer    thats the only concern i have
10:46 thd      tumer: do you think that would be a performance problem?
10:46 tumer    having a too big record to index is my concern
10:46 thd      tumer: paul has had the references and tracings working for building queries slowly for a couple of years
10:46 tumer    why dropped out?
10:45 thd      tumer: what is your hesitation over feasible?
10:45 thd      tumer: paul has had the references and tracings working for building queries slowly for a couple of years
10:44 tumer    i did not say difficult, its feasable
10:44 thd      tumer: what is difficult about adding copies of authorities to meta-records
10:43 thd      tumer: Koha is already an advanced system
10:43 tumer    well it is an advanced system
10:42 thd      ?
10:42 thd      tumer: do you not see that as a significant advantage for every user as long as the user has the option of turning the behaviour on or off for the query
10:41 thd      tumer: subject searches with a small number of query terms in a large collection tend to give much larger result sets and the user needs help from the system.
10:39 thd      tumer: if the best record for addressing some problem that I am trying to solve is in a larger set with 10 records then I need the 10 record result set and 3 records was insufficient
10:38 thd      tumer: if my very precise but uninformed (ignorant of the actual database content for authorised forms) query returns 3 bib records I may or may not be satisfied.
10:36 thd      tumer: we should not deprive users of better results
10:36 thd      tumer: library systems allow better results than Google
10:35 thd      tumer: the Google mentality is that any results are good enough
10:34 thd      tumer: yet, most users fail to do successful subject searches because they seldom choose the correct authorised terms
10:33 dewey    i already had it that way, thd.
10:33 thd      tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely
10:33 thd      tumer: they can always have that option
10:33 tumer    my experience is we are giving too many answers to the user they prefer lesser precise answers
10:32 thd      tumer: some FR*R relationships are not explicitly defined and would require something extra
10:31 thd      tumer: this is merely an explicitly defined relationship contained in library systems records
10:30 thd      tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely
10:29 thd      tumer: this is not quite FRSAR it is basic linking that any system should be able to do yet only Sirsi Unicorn does to my knowledge
10:27 thd      tumer: the system can still give people good results faster and they can still use authorities to refine their query afterwords
10:26 tumer    i see where you are heading thd
10:26 thd      tumer: users growing up with Google are unlikely to have patience to be slow and careful most of the time
10:25 thd      tumer: that is the slow careful good method
10:24 tumer    the proxy or irc is blocking me today
10:24 thd      tumer: I can do that now for more than one authority by building the query one authority at a time
10:24 thd      tumer: I can do that now for more than one authority by building the query one authority at a time
10:23 thd      tumer: yes
10:23 tumer    so you want to find maize by searching corn
10:23 thd      tumer: maize is no longer the authorised heading under LCSH
10:22 thd      tumer: so for example maize is a food plant native to North America
10:22 tumer    i see
10:22 tumer    or whatever
10:22 thd      tumer: with authorities you can find authorised headings by searching the 4XX 5XX in authority records for non-authorised forms
10:22 tumer    s/enligten/
10:21 tumer    what else are we looking for, enlight me
10:21 thd      tumer: yes but the 650 100 700 only contains the authorised heading
10:20 tumer    or 100 or 700 for that matter
10:20 thd      tumer: this would allow finding records with the conjunction of two subject headings without knowing the precise headings
10:20 tumer    but bibliographic record alraedy has 650 with authorities filled in
10:19 thd      tumer: the only way that would work for indexing is if the meta-record contained authorities
10:18 thd      tumer: so instead of building the query slowly for more than one authorised heading the user types in whatever terms come to mind
10:18 thd      tumer: however, there could be an option to search the authorities references and tracings directly from the search form collectively
10:17 dewey    okay, thd.
10:17 thd      tumer: it is also often necessary because the user may never successfully guess the authorised heading successfully unless the user is a librarian with years of experience or otherwise especially familiar with the authorised headings needed
10:15 thd      tumer: that is a good careful but not extra quick way of performing searches using authorities
10:14 thd      tumer: currently to fill authorised headings in the search form a separate search must be done for each authorised heading
10:13 thd      tumer: currently to fill authorised headings a separate search must be done for each authorised heading
10:12 thd      tumer: in authorities you can search by references and tracings to search for the authorised form using non-authorised forms
10:11 thd      tumer: maybe I am using an out of date set of documentation
10:11 thd      tumer: let me give a good example for the trivial one I gave last night
10:11 tumer    get the new one from their cvs
10:10 thd      tumer: ok, I have not read it thoroughly enough
10:10 tumer    yes thats what i said
10:10 thd      tumer: yes, is that in the ID Zebra documentation?
10:09 tumer    ID zebra documented
10:09 thd      tumer: is that documented?
10:09 tumer    yes
10:09 thd      tumer: you use xsl files for indexing?
10:08 tumer    there is no more abs file, a whole bunch of xsl files
10:08 thd      tumer: would you commit your abs file to me?
10:07 tumer    thd:yes on IRC meeting thats whats agrred
10:07 thd      tumer: what does it harm if you commit early.  Oh, does that harm synching?
10:06 tumer    synching and so on
10:06 thd      tumer: I am ready :)
10:06 tumer    that whats agreed
10:06 thd      tumer: why wait for them?
10:06 tumer    i will commit when toins and paul are ready
10:05 thd      tumer: I understand but I want to see exactly so that I can understand perfectly what you are doing
10:05 tumer    it will not change the way you query though
10:04 tumer    all 3.0 stuf
10:04 thd      tumer: when do you plan to commit your abs file?
10:04 tumer    even if they do not exist now
10:04 tumer    i index every field that i will require
10:04 tumer    thd: no i still use the old way of searching
10:03 thd      tumer: I just posted a hypothetical example query from Sebastian in February
10:03 thd      tumer: you are on now
10:02 thd      tumer: but you have no queries like find @attr 1=/*/address@hidden'245']/address@hidden'a'] someterm
10:02 tumer    whtas wrong with irc am i on or not?
10:02 tumer    so probably we are missing something
10:01 tumer    i had problems using XML::Libxml same problem we had with MARC::File::Usmarc
10:01 thd      tumer: so your queries are no different than without XPATH?
10:00 paul     hdl may be interested by this tumer, as he should play with encoding problems next weeks
10:00 tumer    thd: i use xsl stylesheets with xpath indexing
10:00 paul     thd: nope
09:59 tumer    paul: i think we have not solved the utf8 problem either. We just managed to get over it
09:59 paul     (at least I really hope, because it needed almost 1 hour to write !)
09:59 thd      tumer: before you leave, do you use xpath in your queries?
09:59 paul     thd : look at my mail on koha-devel, i hope it's self explanatory
09:59 thd      paul: Are suggesting changing both CVS and install then to something matching, not keeping either the same as now?
09:58 tumer    k
09:58 paul     but maybe we should continue to speak of this on mailing list, to let other express their opinion
09:57 paul     thd : yes indeed.
09:57 tumer    yes thats what i am saying
09:57 tumer    so even untaring it should be enough if you are not running upgrade
09:57 thd      paul: Do you mean that your original suggestion was to have CVS organised the same as the install?
09:56 tumer    so why not have it as it should be once installed as well from the beginning
09:56 tumer    well 2_2 installer does that on windows
09:56 paul     yes, in my structure they are.
09:55 tumer    so separate htdocs for both opac and intranet
09:55 paul     I think 4 is OK
09:55 paul     so, 2x2 or 4 ?
09:55 paul     tumer: i'm not sure we need this additional level, as it will just contain 2 sub dirs
09:55 tumer    opac -> cgi-bin& htdocx
09:55 tumer    intanet->cgi-býn &htdocs
09:55 paul     thd : yep.
09:54 paul     (I just separate htdocs & non htdocs for templates)
09:54 thd      tumer: you mean that you want CVS to be organised the same as the install?
09:54 tumer    my installer makes koha ->intranet &opac
09:54 paul     tumer : I agree with you, as what I suggest is almost the install structure in fact ;-)
09:53 tumer    i always thought you keep cvs this way to make it DIFFICULT to install
09:53 tumer    paul: why dont we have the structure as it installs
09:53 thd      hello tumer
09:52 dewey    hi tumer is still strugling
09:52 toins    hi tumer
09:52 tumer    hi paul and toins and thd
09:52 thd      tumer: do you use xpath in your queries
09:52 paul     (as well as toins)
09:52 paul     hi tumer, i'm here
09:52 tumer    yes looking for paul
09:52 thd      tumer: are you really here
09:50 thd      slef: are users the problem?
09:50 paul     thd : lol
09:49 thd      slef: you fix users instead of problems?
09:21 thd      kados: yes I just checked the koha-cvs logs
09:21 kados    thd: he hasn't committed his stuff yet
09:20 kados    phone running off the hook today ...
09:20 kados    I will try to respond today
09:20 kados    yes, I very much agree
09:20 paul     ah, ok
09:20 kados    that was youre response to MJ :-)
09:19 kados    wait ...
09:19 kados    paul: i agree with much of what he says there
09:19 paul     not yet arrived.
09:19 kados    paul: did you see mj's response?
09:05 thd      kados: are you still there?
09:03 thd      kados: I only see devel-week related files for zebra in CVS
09:02 thd      kados: I am suspecting that tumer has not yet committed his recent improvements
09:01 thd      paul: I do not find any abs files using xpath in CVS.  Am I looking in the wrong place?
09:00 thd      kados: his indexing fields differently with xpath
08:59 thd      good morning paul
08:59 thd      kados: tumer has implemented his schema for bibliographic and holdings records in one record
08:59 paul     (& good morning to kados & thd)
08:59 paul     &kados, take time to read my mail on koha-devel & what you think of my suggestion.
08:58 kados    no
08:58 thd      kados: did you see the conversation I had with tumer last night in the logs?
08:57 kados    thd: of course :-)
08:54 thd      kados: are you there?
02:37 toins    hello
02:37 osmoze   hello all
02:37 hdl      hi
01:11 ai       any here ?? can give me some help plz
01:10 ai       hi,
18:48 tumer    will
18:48 thd      tumer: good night
18:48 thd      tumer: look at that pdf link when you are awake
18:47 tumer    well nice talking to you thd, i'll check those later i have to go to sleep now. G'night!
18:46 thd      tumer: try this though which includes FRAR and FRSAR http://www.kaapeli.fi/~fla/frbr05/delseyModeling%20subject%20access.pdf
18:45 tumer    power fights
18:45 thd      yes this one http://www.oclc.org/research/presentations/oneill/frbrddb2.ppt
18:45 tumer    i have seen a powerpoint at oclc i think it was
18:44 thd      tumer: I have a simple power point for you
18:44 thd      tumer: do you understand well what FRBR does?
18:42 thd      tumer: I used to cause stack overflows at a fairly distant libraries circulation system by borrowing too much at one time to avoid the 5 hour round trip commute
18:41 tumer    my user only take what i give them
18:41 thd      tumer: you need me breaking your system with a single query
18:40 thd      tumer: your users are not demanding enough
18:40 tumer    thd:i understand the question. It did not pose a problem to me yet. if you have such a need than you have to include them in your meta-record
18:39 thd      together in the same meta-record
18:39 thd      tumer: storing copies of related records together should solve that problem at the indexing level
18:38 thd      tumer: If my result set is small it may be manageable but if my result set is large it is the same problem as with 10,000 biblio matches and knowing which are in a particular library
18:37 tumer    well its endless i know
18:36 thd      tumer: yet suppose I do a subject search and want to sort by the ones with the largest number of holdings or some other factor of most used
18:35 thd      tumer: yes if you only want to retrieve matches from one authority at a time
18:34 tumer    searching the authorities (separately) during retrieval is fast enough already
18:33 thd      tumer: given the indexing limitations of Zebra including authorities in the meta-record seems to me the only way to do interesting things with authorities
18:31 thd      tumer: including authorities in the meta-record should give you the same advantage as including holdings in the meta-record with bibliographic records
18:30 tumer    i know but i have to look into some benchmarks
18:29 tumer    try again i think now they have it
18:29 thd      tumer: do you understand the indexing advantage of having related copies of authorities in the same meta-record as bibliographic records?
18:29 thd      tumer: I downloaded it last about a year ago
18:29 tumer    typical brits
18:28 thd      tumer: I downloaded from a BL link and they left it out with a nice note saying that I could buy commercial support from the original developer if I wanted more default features
18:27 tumer    well british library provides the one for unimarc both ways
18:26 thd      tumer: I have usemarcon but I have no functioning configuration files for doing any conversion
18:25 thd      tumer: yes many duplicates but very efficient retrieval by authority tracings and references
18:24 tumer    yes it uses usemarcon utility from any marc to anymarc
18:24 thd      ?
18:24 thd      tumer: really, does it do the inverse as well
18:24 tumer    and duplicate every authority for 1000 times?
18:23 tumer    yazproxy converts from marc21 to unimarc on the fly
18:23 thd      tumer: I think it will be very useful for you to have related authorities in the same meta-record as a bibliographic record
18:22 thd      tumer: not merely something that can work
18:22 tumer    thats what i do for holdings
18:22 thd      tumer
18:22 thd      tumer: I am trying to find the maximally efficient method for indexing
18:21 tumer    than you can index any part of meta-record as you wish
18:21 thd      and separate them by a different XPATH
18:21 tumer    that is possible
18:21 thd      tumer: I want to put them in the same meta-record
18:20 tumer    same server but different db names will do
18:19 thd      tumer: and if you look closely I want to be able to index UNIMARC and MARC 21 and anything else in the same DB.
18:18 thd      tumer: the answer for efficiency was a focus of my koha zebra list question
18:17 tumer    i am not quite sure yet
18:16 thd      tumer: is retrieval efficiency a function of the XPATH length in any way?
18:15 tumer    koharecord/holdings/record/datafield(856)/subfield(a)
18:14 thd      tumer: What do you use?
18:14 tumer    although i do not use that any more. Thats dev-week indexing
18:14 tumer    yes
18:14 thd      tumer: and that works for MARCXML in addition to MARC?
18:13 tumer    or just melm 245 Title
18:13 tumer    syntax is melm 700$a  Author melm 700$d Date etc.
18:12 thd      tumer: what method do we use to distinguish the subfields from different fields?
18:11 tumer    i index 952 at 25 different subfield level
18:11 tumer    no we index at either level field or subfield by subfield
18:10 thd      tumer: I thought we were only indexing fields mostly not subfields specifically
18:10 tumer    no they are melm 700$a and melm 200$a
18:10 thd      tumer: in MARCXML at the element level all $a look alike
18:10 tumer    well we already do dont we?
18:09 thd      tumer: the problem paul was trying to answer was how to index $a in 200 differently from $a in 700
18:07 tumer    yes i read it but did not understand MARC's suggestion
18:07 thd      tumer: did you see that part of my koha-zebra list message?
18:07 tumer    i am not quite sure what he said
18:06 thd      tumer: but about what Marc told paul
18:06 tumer    yes and you may work and improve on this. not in my priorities list
18:05 thd      tumer: if Koha can provide a design for such a major preoccupation of library science for the past 10 years then it could be very important for Koha to.
18:03 tumer    and not much important for me at this stage for koha
18:03 thd      tumer: yes but very primitive for works and expressions only and not efficient
18:02 tumer    from marcxml to frbr
18:02 tumer    i know but there is a conversion route defined by loc
18:01 thd      at least not yet
18:01 thd      tumer: FRBR is not a record type
18:01 tumer    on the fly conversion to any type
18:00 tumer    using the new alvis filter of zebra and an xmlstyle sheet i can index anything and now i can get DC,MOD and FRBR from my meta-record
18:00 thd      or FRAR, FRASR, etc.
17:59 thd      tumer: yes but it does not have one for FRBR
17:59 tumer    thd:marcxml already has type definition for bibliographic,holdings,authorities,community etc that you can set as <record> attribute
17:53 thd      tumer: my purpose is this http://wiki.koha.org/doku.php?id=en:development:super_meta_record_db
17:53 thd      tumer: http://lists.nongnu.org/archive/html/koha-zebra/2006-08/msg00001.html
17:51 tumer    i dont remember this
17:50 thd      tumer: Paul was satisfied with a slower method for what he had asked about indexing authorities at the time
17:49 thd      tumer: In the message that I quoted, Marc tells Paul there is a fast method
17:49 tumer    did i miss that?
17:49 thd      tumer: In my koha-zebra list message I quoted an Index Data zebra list message from February
17:48 tumer    about twice the size of marc-record
17:48 tumer    lots of verbal coming through
17:47 thd      tumer: why is getting XML necessarily slower?
17:47 tumer    but i could not find any other way
17:47 tumer    the retrieval is slower cause i have to get xml and not marc-record
17:47 thd      tumer: slow retrieval was what I was trying to avoid
17:46 tumer    slower on retrieval, very fast indexing overal acceptable and scalable
17:45 thd      tumer: how well does that scale?
17:45 tumer    the schema allows me to define them separately as xpaths
17:44 tumer    my holdings records have their own 001 004 005 and 008 separately indexed
17:44 thd      tumer: what method do you use?
17:44 thd      tumer: I asked poorly the first time and now they are punishing me
17:44 tumer    i already do that
17:43 tumer    well you see you can index say 001 bibliographic separately from 001 holdings
17:43 thd      tumer: Index Data has not answered
17:43 thd      tumer: I asked a question about indexing on the koha-zebra list
17:43 tumer    top-two meaning kohacollection and koharecord?
17:42 thd      tumer: I just had not understood how the top two elements in your schema worked
17:42 tumer    was it your question at yaz-list?
17:41 thd      tumer: no I definitely approve
17:41 tumer    you mean you do not approve?
17:41 thd      tumer: that was what I had assumed
17:36 tumer    you may now
17:36 tumer    one sec
17:36 thd      tumer: I cannot connect to that
17:36 tumer    i already put it to production code at NEU
17:35 tumer    schema is there together with koharaecod.xsd
17:35 tumer    thd: look at http://library.neu.edu.tr/kohanamespace
17:34 thd      tumer: hello.  Would you explain the top two levels of your proposed XML schema to me?
17:33 tumer    hi thd
17:31 thd      tumer [A]: are you there?
15:09 owen     Yeah, loads
15:08 Burgwork owen, having fun with css and IE?