IRC log for #koha, 2006-08-12

Time	Nick	Message
15:08	Burgwork	owen, having fun with css and IE?
15:09	owen	Yeah, loads
17:31	thd	tumer [A]: are you there?
17:33	tumer	hi thd
17:34	thd	tumer: hello. Would you explain the top two levels of your proposed XML schema to me?
17:35	tumer	thd: look at http://library.neu.edu.tr/kohanamespace
17:35	tumer	schema is there together with koharaecod.xsd
17:36	tumer	i already put it to production code at NEU
17:36	thd	tumer: I cannot connect to that
17:36	tumer	one sec
17:36	tumer	you may now
17:41	thd	tumer: that was what I had assumed
17:41	tumer	you mean you do not approve?
17:41	thd	tumer: no I definitely approve
17:42	tumer	was it your question at yaz-list?
17:42	thd	tumer: I just had not understood how the top two elements in your schema worked
17:43	tumer	top-two meaning kohacollection and koharecord?
17:43	thd	tumer: I asked a question about indexing on the koha-zebra list
17:43	thd	tumer: Index Data has not answered
17:43	tumer	well you see you can index say 001 bibliographic separately from 001 holdings
17:44	tumer	i already do that
17:44	thd	tumer: I asked poorly the first time and now they are punishing me
17:44	thd	tumer: what method do you use?
17:44	tumer	my holdings records have their own 001 004 005 and 008 separately indexed
17:45	tumer	the schema allows me to define them separately as xpaths
17:45	thd	tumer: how well does that scale?
17:46	tumer	slower on retrieval, very fast indexing overal acceptable and scalable
17:47	thd	tumer: slow retrieval was what I was trying to avoid
17:47	tumer	the retrieval is slower cause i have to get xml and not marc-record
17:47	tumer	but i could not find any other way
17:47	thd	tumer: why is getting XML necessarily slower?
17:48	tumer	lots of verbal coming through
17:48	tumer	about twice the size of marc-record
17:49	thd	tumer: In my koha-zebra list message I quoted an Index Data zebra list message from February
17:49	tumer	did i miss that?
17:49	thd	tumer: In the message that I quoted, Marc tells Paul there is a fast method
17:50	thd	tumer: Paul was satisfied with a slower method for what he had asked about indexing authorities at the time
17:51	tumer	i dont remember this
17:53	thd	tumer: http://lists.nongnu.org/archiv[…]-08/msg00001.html
17:53	thd	tumer: my purpose is this http://wiki.koha.org/doku.php?[…]er_meta_record_db
17:59	tumer	thd:marcxml already has type definition for bibliographic,holdings,authorities,community etc that you can set as <record> attribute
17:59	thd	tumer: yes but it does not have one for FRBR
18:00	thd	or FRAR, FRASR, etc.
18:00	tumer	using the new alvis filter of zebra and an xmlstyle sheet i can index anything and now i can get DC,MOD and FRBR from my meta-record
18:01	tumer	on the fly conversion to any type
18:01	thd	tumer: FRBR is not a record type
18:01	thd	at least not yet
18:02	tumer	i know but there is a conversion route defined by loc
18:02	tumer	from marcxml to frbr
18:03	thd	tumer: yes but very primitive for works and expressions only and not efficient
18:03	tumer	and not much important for me at this stage for koha
18:05	thd	tumer: if Koha can provide a design for such a major preoccupation of library science for the past 10 years then it could be very important for Koha to.
18:06	tumer	yes and you may work and improve on this. not in my priorities list
18:06	thd	tumer: but about what Marc told paul
18:07	tumer	i am not quite sure what he said
18:07	thd	tumer: did you see that part of my koha-zebra list message?
18:07	tumer	yes i read it but did not understand MARC's suggestion
18:09	thd	tumer: the problem paul was trying to answer was how to index $a in 200 differently from $a in 700
18:10	tumer	well we already do dont we?
18:10	thd	tumer: in MARCXML at the element level all $a look alike
18:10	tumer	no they are melm 700$a and melm 200$a
18:10	thd	tumer: I thought we were only indexing fields mostly not subfields specifically
18:11	tumer	no we index at either level field or subfield by subfield
18:11	tumer	i index 952 at 25 different subfield level
18:12	thd	tumer: what method do we use to distinguish the subfields from different fields?
18:13	tumer	syntax is melm 700$a Author melm 700$d Date etc.
18:13	tumer	or just melm 245 Title
18:14	thd	tumer: and that works for MARCXML in addition to MARC?
18:14	tumer	yes
18:14	tumer	although i do not use that any more. Thats dev-week indexing
18:14	thd	tumer: What do you use?
18:15	tumer	koharecord/holdings/record/datafield(856)/subfield(a)
18:16	thd	tumer: is retrieval efficiency a function of the XPATH length in any way?
18:17	tumer	i am not quite sure yet
18:18	thd	tumer: the answer for efficiency was a focus of my koha zebra list question
18:19	thd	tumer: and if you look closely I want to be able to index UNIMARC and MARC 21 and anything else in the same DB.
18:20	tumer	same server but different db names will do
18:21	thd	tumer: I want to put them in the same meta-record
18:21	tumer	that is possible
18:21	thd	and separate them by a different XPATH
18:21	tumer	than you can index any part of meta-record as you wish
18:22	thd	tumer: I am trying to find the maximally efficient method for indexing
18:22	thd	tumer
18:22	tumer	thats what i do for holdings
18:22	thd	tumer: not merely something that can work
18:23	thd	tumer: I think it will be very useful for you to have related authorities in the same meta-record as a bibliographic record
18:23	tumer	yazproxy converts from marc21 to unimarc on the fly
18:24	tumer	and duplicate every authority for 1000 times?
18:24	thd	tumer: really, does it do the inverse as well
18:24	thd	?
18:24	tumer	yes it uses usemarcon utility from any marc to anymarc
18:25	thd	tumer: yes many duplicates but very efficient retrieval by authority tracings and references
18:26	thd	tumer: I have usemarcon but I have no functioning configuration files for doing any conversion
18:27	tumer	well british library provides the one for unimarc both ways
18:28	thd	tumer: I downloaded from a BL link and they left it out with a nice note saying that I could buy commercial support from the original developer if I wanted more default features
18:29	tumer	typical brits
18:29	thd	tumer: I downloaded it last about a year ago
18:29	thd	tumer: do you understand the indexing advantage of having related copies of authorities in the same meta-record as bibliographic records?
18:29	tumer	try again i think now they have it
18:30	tumer	i know but i have to look into some benchmarks
18:31	thd	tumer: including authorities in the meta-record should give you the same advantage as including holdings in the meta-record with bibliographic records
18:33	thd	tumer: given the indexing limitations of Zebra including authorities in the meta-record seems to me the only way to do interesting things with authorities
18:34	tumer	searching the authorities (separately) during retrieval is fast enough already
18:35	thd	tumer: yes if you only want to retrieve matches from one authority at a time
18:36	thd	tumer: yet suppose I do a subject search and want to sort by the ones with the largest number of holdings or some other factor of most used
18:37	tumer	well its endless i know
18:38	thd	tumer: If my result set is small it may be manageable but if my result set is large it is the same problem as with 10,000 biblio matches and knowing which are in a particular library
18:39	thd	tumer: storing copies of related records together should solve that problem at the indexing level
18:39	thd	together in the same meta-record
18:40	tumer	thd:i understand the question. It did not pose a problem to me yet. if you have such a need than you have to include them in your meta-record
18:40	thd	tumer: your users are not demanding enough
18:41	thd	tumer: you need me breaking your system with a single query
18:41	tumer	my user only take what i give them
18:42	thd	tumer: I used to cause stack overflows at a fairly distant libraries circulation system by borrowing too much at one time to avoid the 5 hour round trip commute
18:44	thd	tumer: do you understand well what FRBR does?
18:44	thd	tumer: I have a simple power point for you
18:45	tumer	i have seen a powerpoint at oclc i think it was
18:45	thd	yes this one http://www.oclc.org/research/p[…]eill/frbrddb2.ppt
18:45	tumer	power fights
18:46	thd	tumer: try this though which includes FRAR and FRSAR http://www.kaapeli.fi/~fla/frb[…]ject%20access.pdf
18:47	tumer	well nice talking to you thd, i'll check those later i have to go to sleep now. G'night!
18:48	thd	tumer: look at that pdf link when you are awake
18:48	thd	tumer: good night
18:48	tumer	will
01:10	ai	hi,
01:11	ai	any here ?? can give me some help plz
02:37	hdl	hi
02:37	osmoze	hello all
02:37	toins	hello
08:54	thd	kados: are you there?
08:57	kados	thd: of course :-)
08:58	thd	kados: did you see the conversation I had with tumer last night in the logs?
08:58	kados	no
08:59	paul	&kados, take time to read my mail on koha-devel & what you think of my suggestion.
08:59	paul	(& good morning to kados & thd)
08:59	thd	kados: tumer has implemented his schema for bibliographic and holdings records in one record
08:59	thd	good morning paul
09:00	thd	kados: his indexing fields differently with xpath
09:01	thd	paul: I do not find any abs files using xpath in CVS. Am I looking in the wrong place?
09:02	thd	kados: I am suspecting that tumer has not yet committed his recent improvements
09:03	thd	kados: I only see devel-week related files for zebra in CVS
09:05	thd	kados: are you still there?
09:19	kados	paul: did you see mj's response?
09:19	paul	not yet arrived.
09:19	kados	paul: i agree with much of what he says there
09:19	kados	wait ...
09:20	kados	that was youre response to MJ :-)
09:20	paul	ah, ok
09:20	kados	yes, I very much agree
09:20	kados	I will try to respond today
09:20	kados	phone running off the hook today ...
09:21	kados	thd: he hasn't committed his stuff yet
09:21	thd	kados: yes I just checked the koha-cvs logs
09:49	thd	slef: you fix users instead of problems?
09:50	paul	thd : lol
09:50	thd	slef: are users the problem?
09:52	thd	tumer: are you really here
09:52	tumer	yes looking for paul
09:52	paul	hi tumer, i'm here
09:52	paul	(as well as toins)
09:52	thd	tumer: do you use xpath in your queries
09:52	tumer	hi paul and toins and thd
09:52	toins	hi tumer
09:52	dewey	hi tumer is still strugling
09:53	thd	hello tumer
09:53	tumer	paul: why dont we have the structure as it installs
09:53	tumer	i always thought you keep cvs this way to make it DIFFICULT to install
09:54	paul	tumer : I agree with you, as what I suggest is almost the install structure in fact ;-)
09:54	tumer	my installer makes koha ->intranet &opac
09:54	thd	tumer: you mean that you want CVS to be organised the same as the install?
09:54	paul	(I just separate htdocs & non htdocs for templates)
09:55	paul	thd : yep.
09:55	tumer	intanet->cgi-býn &htdocs
09:55	tumer	opac -> cgi-bin& htdocx
09:55	paul	tumer: i'm not sure we need this additional level, as it will just contain 2 sub dirs
09:55	paul	so, 2x2 or 4 ?
09:55	paul	I think 4 is OK
09:55	tumer	so separate htdocs for both opac and intranet
09:56	paul	yes, in my structure they are.
09:56	tumer	well 2_2 installer does that on windows
09:56	tumer	so why not have it as it should be once installed as well from the beginning
09:57	thd	paul: Do you mean that your original suggestion was to have CVS organised the same as the install?
09:57	tumer	so even untaring it should be enough if you are not running upgrade
09:57	tumer	yes thats what i am saying
09:57	paul	thd : yes indeed.
09:58	paul	but maybe we should continue to speak of this on mailing list, to let other express their opinion
09:58	tumer	k
09:59	thd	paul: Are suggesting changing both CVS and install then to something matching, not keeping either the same as now?
09:59	paul	thd : look at my mail on koha-devel, i hope it's self explanatory
09:59	thd	tumer: before you leave, do you use xpath in your queries?
09:59	paul	(at least I really hope, because it needed almost 1 hour to write !)
09:59	tumer	paul: i think we have not solved the utf8 problem either. We just managed to get over it
10:00	paul	thd: nope
10:00	tumer	thd: i use xsl stylesheets with xpath indexing
10:00	paul	hdl may be interested by this tumer, as he should play with encoding problems next weeks
10:01	thd	tumer: so your queries are no different than without XPATH?
10:01	tumer	i had problems using XML::Libxml same problem we had with MARC::File::Usmarc
10:02	tumer	so probably we are missing something
10:02	tumer	whtas wrong with irc am i on or not?
10:02	thd	tumer: but you have no queries like find @attr 1=/*/addresshidden'245']/addresshidden'a'] someterm
10:03	thd	tumer: you are on now
10:03	thd	tumer: I just posted a hypothetical example query from Sebastian in February
10:04	tumer	thd: no i still use the old way of searching
10:04	tumer	i index every field that i will require
10:04	tumer	even if they do not exist now
10:04	thd	tumer: when do you plan to commit your abs file?
10:04	tumer	all 3.0 stuf
10:05	tumer	it will not change the way you query though
10:05	thd	tumer: I understand but I want to see exactly so that I can understand perfectly what you are doing
10:06	tumer	i will commit when toins and paul are ready
10:06	thd	tumer: why wait for them?
10:06	tumer	that whats agreed
10:06	thd	tumer: I am ready :)
10:06	tumer	synching and so on
10:07	thd	tumer: what does it harm if you commit early. Oh, does that harm synching?
10:07	tumer	thd:yes on IRC meeting thats whats agrred
10:08	thd	tumer: would you commit your abs file to me?
10:08	tumer	there is no more abs file, a whole bunch of xsl files
10:09	thd	tumer: you use xsl files for indexing?
10:09	tumer	yes
10:09	thd	tumer: is that documented?
10:09	tumer	ID zebra documented
10:10	thd	tumer: yes, is that in the ID Zebra documentation?
10:10	tumer	yes thats what i said
10:10	thd	tumer: ok, I have not read it thoroughly enough
10:11	tumer	get the new one from their cvs
10:11	thd	tumer: let me give a good example for the trivial one I gave last night
10:11	thd	tumer: maybe I am using an out of date set of documentation
10:12	thd	tumer: in authorities you can search by references and tracings to search for the authorised form using non-authorised forms
10:13	thd	tumer: currently to fill authorised headings a separate search must be done for each authorised heading
10:14	thd	tumer: currently to fill authorised headings in the search form a separate search must be done for each authorised heading
10:15	thd	tumer: that is a good careful but not extra quick way of performing searches using authorities
10:17	thd	tumer: it is also often necessary because the user may never successfully guess the authorised heading successfully unless the user is a librarian with years of experience or otherwise especially familiar with the authorised headings needed
10:17	dewey	okay, thd.
10:18	thd	tumer: however, there could be an option to search the authorities references and tracings directly from the search form collectively
10:18	thd	tumer: so instead of building the query slowly for more than one authorised heading the user types in whatever terms come to mind
10:19	thd	tumer: the only way that would work for indexing is if the meta-record contained authorities
10:20	tumer	but bibliographic record alraedy has 650 with authorities filled in
10:20	thd	tumer: this would allow finding records with the conjunction of two subject headings without knowing the precise headings
10:20	tumer	or 100 or 700 for that matter
10:21	thd	tumer: yes but the 650 100 700 only contains the authorised heading
10:21	tumer	what else are we looking for, enlight me
10:22	tumer	s/enligten/
10:22	thd	tumer: with authorities you can find authorised headings by searching the 4XX 5XX in authority records for non-authorised forms
10:22	tumer	or whatever
10:22	tumer	i see
10:22	thd	tumer: so for example maize is a food plant native to North America
10:23	thd	tumer: maize is no longer the authorised heading under LCSH
10:23	tumer	so you want to find maize by searching corn
10:23	thd	tumer: yes
10:24	thd	tumer: I can do that now for more than one authority by building the query one authority at a time
10:24	thd	tumer: I can do that now for more than one authority by building the query one authority at a time
10:24	tumer	the proxy or irc is blocking me today
10:25	thd	tumer: that is the slow careful good method
10:26	thd	tumer: users growing up with Google are unlikely to have patience to be slow and careful most of the time
10:26	tumer	i see where you are heading thd
10:27	thd	tumer: the system can still give people good results faster and they can still use authorities to refine their query afterwords
10:29	thd	tumer: this is not quite FRSAR it is basic linking that any system should be able to do yet only Sirsi Unicorn does to my knowledge
10:30	thd	tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely
10:31	thd	tumer: this is merely an explicitly defined relationship contained in library systems records
10:32	thd	tumer: some FR*R relationships are not explicitly defined and would require something extra
10:33	tumer	my experience is we are giving too many answers to the user they prefer lesser precise answers
10:33	thd	tumer: they can always have that option
10:33	thd	tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely
10:33	dewey	i already had it that way, thd.
10:34	thd	tumer: yet, most users fail to do successful subject searches because they seldom choose the correct authorised terms
10:35	thd	tumer: the Google mentality is that any results are good enough
10:36	thd	tumer: library systems allow better results than Google
10:36	thd	tumer: we should not deprive users of better results
10:38	thd	tumer: if my very precise but uninformed (ignorant of the actual database content for authorised forms) query returns 3 bib records I may or may not be satisfied.
10:39	thd	tumer: if the best record for addressing some problem that I am trying to solve is in a larger set with 10 records then I need the 10 record result set and 3 records was insufficient
10:41	thd	tumer: subject searches with a small number of query terms in a large collection tend to give much larger result sets and the user needs help from the system.
10:42	thd	tumer: do you not see that as a significant advantage for every user as long as the user has the option of turning the behaviour on or off for the query
10:42	thd	?
10:43	tumer	well it is an advanced system
10:43	thd	tumer: Koha is already an advanced system
10:44	thd	tumer: what is difficult about adding copies of authorities to meta-records
10:44	tumer	i did not say difficult, its feasable
10:45	thd	tumer: paul has had the references and tracings working for building queries slowly for a couple of years
10:45	thd	tumer: what is your hesitation over feasible?
10:46	tumer	why dropped out?
10:46	thd	tumer: paul has had the references and tracings working for building queries slowly for a couple of years
10:46	tumer	having a too big record to index is my concern
10:46	thd	tumer: do you think that would be a performance problem?
10:47	tumer	thats the only concern i have
10:48	thd	tumer: that is also my concern which is why I asked Index Data about the efficiency of XPATH indexing
10:48	thd	tumer: they have not responded
10:49	tumer	they have answered before saying its slow on indexing
10:49	thd	tumer: I did not ask my question of Index Data correctly the first time
10:50	thd	tumer: maybe there could be a supplementary database with larger meta-records which was slower to index
10:50	tumer	you could not do this with existing version of zebra anyway. only forthcoming zebra
10:51	thd	tumer: why, what is the problem?
10:51	tumer	no meta-record ýndexing was possible
10:51	thd	tumer: Does your meta-record work?
10:52	tumer	yes but with cvs zebra
10:52	thd	tumer: so I understand that this only works in CVS now
10:52	tumer	not released yet
10:52	tumer	yes
10:53	thd	tumer: yet, It is intended for release in due course
10:53	tumer	hope so
10:53	thd	:)
10:54	thd	tumer: what would be wrong with having a supplementary database of records which was slower too index
10:54	tumer	i was thinking along the smae lines
10:54	tumer	same
10:55	thd	tumer: those could be updated by a batch process while the smaller records were updated in real time
10:55	tumer	correct
10:55	thd	tumer: well then we are thinking along similar paths
10:56	tumer	i even slowed my realtime updating to within 2 minutes, safer on zebra db
10:57	thd	tumer: Joshua was willing to rephrase my XPATH indexing question
10:58	tumer	i already answered your xpath question i thought
10:58	thd	tumer: we should have some answer from Index Data about how to get maximum performance from XPATH
10:58	tumer	i index xpath with xslt stylesheets
10:58	thd	tumer: my question is really about whether shorter XPATHs make a difference in performance
10:59	tumer	at indexing it makes it faster not for searching
10:59	thd	tumer: yet that is important
11:00	tumer	but having said that if you xpath everything than i think it will be slow and cumbersome
11:00	thd	tumer: we might be able to design meta-records with shorter XPATHs
11:00	thd	tumer: do you not XPATH everything now?
11:01	thd	tumer: do you not XPATH everything now?
11:01	tumer	i have the shortest path to bibliographic record keeping in sync with marc21
11:01	tumer	no i do not
11:01	tumer	i do not index everything
11:02	tumer	similar to record.abs
11:02	thd	tumer: so you still have elem 100$a sometimes?
11:02	tumer	i choose what to index
11:02	tumer	similar structure
11:02	tumer	i choose which paths to index
11:02	tumer	and only those
11:04	thd	tumer: I do not quite understand what you mean by choosing which paths to index except as opposed to indexing every arbitrary and unneeded path
11:04	tumer	so its a hybrid xpath indexing. xpath only allows me to index same datafields with differnt indexes
11:04	tumer	001 bibliographic for biblionumber 001 holdings for itemnumber etc
11:05	tumer	cause they have differnt xpaths
11:05	thd	tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster?
11:06	thd	tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster?
11:06	dewey	i already had it that way, thd.
11:06	tumer	yes but elem100$a does not distingush differnt datafields this does
11:08	tumer	i do not use xpath enabled indexing it is slow and so it says ID documentation
11:08	thd	tumer: ahh, so your xslt method is much faster
11:09	tumer	yes
11:09	tumer	100K metarecords less than 10 min
11:10	thd	tumer: I had assumed the slowness for xpath enabled was a function of allowing xpath in queries
11:11	tumer	no it is sloe in indexing not in retrieval
11:11	thd	tumer: Marc wrote that you could speed things up with xpath enabled by indexing xpaths
11:12	tumer	xpath enabled makes bigger indexes and is slow
11:12	tumer	but he suggested xelem which does not exist
11:12	tumer	not out of the box, pay ID and they will write it
11:13	thd	tumer: yes, Sebastian corrected him. But I misunderstood what out of the box meant.
11:14	tumer	if there is nothing else i have to go for dinner now
11:14	thd	tumer: would you zip or tar your xslt files so that I can see them?
11:15	tumer	one sec
11:15	thd	tumer: I would like to see everything related to how your are indexing now
11:16	thd	tumer: I want to understand this perfectly
11:17	tumer	http://library.neu.edu.tr/koha[…]ce/koha2index.xsl

← Previous day | Channels | #koha index | Today | Search | Google Search | Plain-Text | plain, newest first | summary