IRC log for #koha, 2005-06-09

Time	Nick	Message
12:46	kados	sanspach: hi there
12:46	sanspach	hey!
12:46	kados	sanspach: it seems I'm having some problems with the data
12:46	kados	zebra is complaining when I index
12:46	kados	but I don't have any details yet
12:46	kados	I'm thinking of running a check on the records using MARC::Record
12:46	kados	later today
12:46	sanspach	there may be some usmarc rules that aren't followed
12:47	sanspach	I'm thinking in particular that there may be repeated 001 fields
12:47	kados	sanspach: it's strange since it gets through quite a few records before crashing
12:47	kados	hmmm, that might be a problem
12:48	sanspach	our system doesn't natively store in MARC and therefore doesn't enforce usmarc rules
12:48	sanspach	I knew I needed to strip out our non-standard subfields
12:48	sanspach	the newer records (created on current system, ca. 4.5 yrs) would have only single 001
12:49	sanspach	but records imported from old system went through a merge algorithm and ended up with multiple 001 fields
12:49	sanspach	didn't think about it until after I'd sent the file
12:49	kados	interesting
12:50	kados	do you think it would be easier to modif them from the big raw file or on your end?
12:50	sanspach	depends on your tools; I can't natively work with them in MARC, so I do the edits then convert
12:51	sanspach	if you can work with them in MARC, it might be better for you to manipulate within the file
12:52	sanspach	also, when I looked at bits of the files I noticed that fields aren't always in usmarc order--
12:52	sanspach	specifically 008 seems to be in odd places (sometimes at end)
12:54	kados	I'll give it a shot and if I can't do it I'll let you know
12:55	sanspach	great; I've got the data all extracted now, so it will just be a matter of re-parsing the records and converting
12:55	kados	sweet
13:04	kados	sanspach: if it's not extracted as MARC what is it extracted as?
13:04	kados	(out of curiosity)
13:04	sanspach	flat ascii! (I live and die by perl but I use activestate on win32, so Marc::... aren't available)
13:10	kados	sanspach: right
13:10	kados	sanspach: so how big is the flat ascii file?
13:10	kados	sanspach: it might actually be easier for me to import that with MARC::Record (as it will automatically enforce correct MARC syntax)
13:11	sanspach	don't have one (merged them after converting to MARC) but could easily do it; in fact,
13:11	sanspach	I could remove the duplicate 001's as I'm merging
13:14	kados	hmmm
13:14	kados	well it's up to you
13:14	kados	I owe you already for providing the records ;-)
13:15	kados	so a little work to tweak them isn't really a problem
13:15	kados	on the other hand, if you've got the proc time and bandwidth to do another export and ftp that'd be ok too ;-)
13:15	sanspach	I'll try to figure out good compression; re-sending them in ASCII is going to be no problem at all! (MARC's the hard part)
13:15	kados	gzip is pretty good
13:15	kados	if you'd like to cut back on bandwidth
13:19	sanspach	can MARC::Record read them in from LC's marcbreaker/marcmaker formate?
13:21	kados	no idea
13:36	sanspach	kados: just reviewed MARC::Record docs at cpan and it looks like those tools are for MARC records
13:37	sanspach	so you have a script that reads in flat files and does the creating?
13:39	kados	sortof ... I can use MARC::Record to take data in any format and feed it in to construct a valid MARC record
13:40	kados	and export as iso2709
13:40	kados	I've done this in the past for various projects
13:40	kados	like Koha's Z39.50 server
13:40	sanspach	OK; marcbreaker has three idiosyncrasies:
13:40	sanspach	1) each line (=field) starts with = character
13:41	sanspach	2) next comes tag (leader has name LDR rather than 000) and two spaces
13:41	sanspach	3) next comes indicators with spaces substituted by \ character (backslash)
13:43	sanspach	each line is thus /^=(LDR\|\d{3}) (.*)$/
13:43	sanspach	with $1 being tag and
13:44	sanspach	$2 being data (where tag<10, all data, tag>9 /^(.)(.)(.*)$/ for ind1=$1,ind2=$2,field=$3)
13:46	sanspach	OK, done; I've removed dup 001 (can't say for sure tag order is up to standard); text file slightly smaller
13:47	sanspach	than MARC file was (makes sense--no directories)
13:47	kados	right
13:57	kados	sanspach: if you take a look at http://liblime.com/zap/advanced.html you can see the latest results I'm getting from the data
13:57	kados	sanspach: it looks like the search is working but the data coming back isn't displaying normally
13:57	kados	(choose the LibLime to see your data)
13:58	sanspach	hmm
13:58	kados	(also notice that it's extremely fast ;-)
13:58	kados	(which is good news)
13:58	kados	i'd be interested in comparing it's speed and results to your current system
13:58	kados	do you have a link for that?
14:00	sanspach	specs for z39.50 connection are at http://kb.iu.edu/data/ajhr.html
14:01	kados	k ... just a sec
14:03	kados	heh ... ok ... try that
14:04	kados	so the result set numbers aren't adding up
14:04	kados	interestingly
14:04	sanspach	yeah, saw that from my favorite author search (durrenmatt)
14:04	sanspach	looks like field and/or record boundaries are all messed up
14:04	kados	yea probably
14:05	sanspach	maybe from multiple 001s?
14:05	kados	could be ... wanna send me the updated one and we'll try that?
14:05	sanspach	working on compressing now
14:05	kados	cool
14:07	kados	sanspach: it'll be neat to compare Indiana's Zserver to Zap/Zebra
14:07	sanspach	our server's big (/fast) but I'm not sure how optimized we are for z39.50 connections--that's never been very high priority
14:08	kados	sanspach: esp since you're prolly paying about 4-6K per year for that module
14:09	sanspach	mostly 'cause we think we need it (as state inst. / major research lib. / etc.) not 'cause we actually want to support it
14:09	sanspach	don't think we've got anybody knockin' down our door when it goes down!
14:09	kados	right ... still ... it'd be neat if you were able to propose cutting back on the ILS budget a bit
14:21	sanspach	kados: compressed file 26% of original; ftp begun but will take ca. 40 minutes
14:22	kados	sweet
14:23	kados	let me know when it's done
14:23	kados	(FYI the indexing takes about 4 min too)
14:23	kados	s/4/40/
14:23	sanspach	will do
14:24	sanspach	still slays me it goes so fast
15:00	sanspach	kados: ftp is done, right on schedule; let me know if there are any problems with the file or record format
15:02	kados	sanspach: sweet ... I'll get started on the indexing
15:04	kados	unzipping now
15:04	kados	tar -xzvf /home/sanspach/all.tar.gz
15:04	kados	all.txt
15:04	kados	tar: Skipping to next header
15:04	kados	tar: Archive contains obsolescent base-64 headers
15:07	sanspach	working on it; google says common error; workaround should be possible...
15:09	kados	sanspach: tar: Read 3790 bytes from /home/sanspach/all.tar.gz
15:10	kados	tar: Error exit delayed from previous errors
15:10	kados	sanspach: any clue why that's happening?
15:11	sanspach	'cause I used a win32 tool to tar/gzip?!
15:11	kados	could be :-(
15:11	sanspach	workaround is to unzip first then tar, but I'm seeing an error there, too; but maybe it will finish ok
15:12	sanspach	ls
15:12	kados	all.txt
15:12	sanspach	oops, wrong window :)
15:12	kados	hehe
15:13	kados	so all.txt is it?
15:13	sanspach	not good; the all.txt file should be about same size as all.tar (ever so slightly smaller: without tar header)
15:13	sanspach	way too small--it is choking partway through or something
15:14	kados	right ..
15:14	kados	look at the output from tail
15:15	kados	it's choking here:
15:15	kados	=505 1\[v. 1.] Theoretical and empiri
15:15	kados	for some reason
15:15	sanspach	data is probably irrelevant; most likely bad length from header, etc.
15:16	kados	fair enough
15:24	sanspach	kados: the all.tar file should be good to use if you can just strip the first few bytes
15:24	sanspach	maybe read in first line and dump everything before the = that is the start of the data?
15:25	sanspach	don't know what text editing tools you might have that can handle file that large; don't want to read it all into memory!
15:31	kados	grep, sed, awk, bash ;-)
15:31	kados	perl even ;-)
15:36	kados	sed 's/*=//' all.tar
15:36	kados	I'm making a backup first
15:36	kados	:-)
15:40	kados	hmmm, seems it didn't work
16:05	sanspach	OK, think I've got it with perl
16:09	kados	sweet
16:09	kados	let me know when i's done uncompressing
16:12	kados	cool ... done eh?
16:12	sanspach	looks like the right size...
16:12	sanspach	seems right
16:12	kados	ok ... I"m gonna index it (I'll move it first)
16:12	sanspach	sorry for the hassle
16:14	kados	hmmm, strange error: 14:05:08-07/06 ../../index/zebraidx(32333) [warn] records/sample-records:0 MARC record length < 25, is 0
16:14	kados	it's not indexing the file
16:15	sanspach	it's flat ascii, not marc
16:15	kados	well that would explain it ;-)
16:24	kados	sanspach: so ... just so I have this straight
16:24	kados	the file is currently in MARCBreaker format
16:24	kados	you already tried using MARCMaker and it didn't produce valid MARC records
16:25	kados	so now we're going to try to use MARC::Record to create a valid MARC record
16:25	kados	sound right?
16:25	sanspach	well, only sort of
16:25	sanspach	I had separate small files which I converted into MARC
16:26	sanspach	I'm guessing the problem was the repeated 001's
16:26	kados	using Marc Maker for the conversion (and join)
16:26	kados	right
16:27	sanspach	I used MarcMaker for the conversion; I joined them afterward
16:27	kados	ok ... how big was each file (approx)
16:28	sanspach	100mgb
16:54	kados	sanspach: I'm headed home now ... I hacked together a start of a script to convert from marcmaker to usmarc using MARC::Record and I'll try to finish it up tonight
16:55	sanspach	OK; if I think of anything brilliant, I'll let you know :)
16:56	kados	sanspach: sounds good ;-)
02:37	osmoze	bonjour
02:37	paul	salut js
02:37	osmoze	coucou paul
02:37	osmoze	t as deux minutes ?
02:38	paul	vas y, je t'écoute
02:39	osmoze	j ai une question : Il y a t il un moyen simple pour avoir la liste des retard dans overdue du type personne1 --> livre1,livre2,livre3 au lieu de personne1--> livre 1; personne1->livre2 etc etc
02:39	osmoze	ceci pour un mailing
02:40	paul	dans la prochaine version, on a bien la liste des ouvrages en retard.
02:40	osmoze	j avais fait un petit script php, mais j ai une erreur que je ne peux reparer :(
02:40	osmoze	alors j en viens a vos services :)
02:40	paul	c'était un manque évident.
02:40	osmoze	comment ca ?
02:40	osmoze	il y avait deja un module (overdue.pl)
02:41	paul	le overduenotice.pl a été amélioré.
02:41	paul	il envoie un mail à tous les lecteurs ayant un mail pour leur donner leur liste de retard.
02:41	paul	et il envoie un mail à la bibliothèque avec tous les lecteurs ayant des retards mais pas de mail
02:41	osmoze	le probleme est qu on ne peux pas se servir de ces données
02:42	osmoze	du mail a la bibliotheque
02:42	osmoze	car mon but etait de creer une lettre type mailing et d inclure les noms automatiquement
02:42	osmoze	cependant, ca ne marche qu avec une base ou un fichier text bien etabli
02:43	paul	exact. En fait, il faudrait que l'on mette en PJ un fichier CSV avec les infos
02:43	osmoze	tout a fait cela
02:43	osmoze	mais il y aura toujours le probleme de la redondance des noms
02:44	osmoze	pour les emprunteur qui ont plus de un livre en retard
02:46	paul	oui. On pourrait imaginer faire ca avec les titres séparés par une ,
02:46	paul	ils apparaitraient sur une seule ligne dans le mailing.
02:48	osmoze	c est exactement ce que je cherche :)
02:50	osmoze	comme cela, je peux faire un mailing rapide et efficace pour l envoi de lettre ^^
02:50	paul	avec OpenOffice ?
02:51	paul	si c'est le cas, faudra mettre ce doc dans le CVS.
02:51	osmoze	J avais tester avec word, je vais tester avec openoffice
02:51	osmoze	(les machines de l accueil son sous windows + word, mais j exclu pas de mettre openoffice-win32)
02:53	hdl_away	hi.
02:53	osmoze	hello hdl
02:54	osmoze	paul, tu n aurais pas un petit fichier csv tout fait sous la main ? ^^
02:54	hdl	osmoze : c'est juste un fichier texte séparé par des points virgules. ;)
02:55	osmoze	donc c'est bon
02:55	osmoze	ca marche bien
04:55	jean	hi/bonjour
05:04	paul	mercredi, c'est la journée des enfants ET la journée de Jean sur Koha ;-)
05:04	paul	bonjour Jean. Tu vas bien ?
05:05	jean	:)
05:05	jean	oui tres bien
05:06	paul	ton doc sur l'optimisation avance bien ?
05:06	jean	je pense release aujourd'hui
05:06	paul	super !
05:06	paul	je suis impatient de le lire.
05:06	jean	mais j'ai travaille qu'un jour par semaine dessus et encore avec de multiples ralentissement
05:06	jean	c'est pour ca que ca a mi un peu de temps :)
05:07	jean	bah en tout cas je compte sur toi pour me donner ton avis
05:07	paul	tu peux y compter.
05:10	paul	bon, allez, à table. A tout à l'heure
10:29	Sylvain	hi
10:29	hdl	hi Sylvain.
10:34	Sylvain	is it envisaged to include xml in koha in any way ?
10:35	hdl	No, as far as I know. Are you interested in doing it ;) ?
10:36	Sylvain	no, just because a customer was asking ... I hadn't heard anything about it so I wondered
10:38	paul	sylvain : "include xml" is not enough.
10:39	paul	what does he want : exporting XML, importing xml, showing xml...
10:39	hdl	... using xml ?
10:39	Sylvain	I know paul it's not enough :) But the customer is a librarian and didn't say more in its mail. So I was asking if anything was envisaged in xml
10:40	paul	zebra cause xml pas trop mal il semble...
10:40	Sylvain	mouais, rien de bien précis en tout cas concernant xml alors
10:41	paul	on est en phase "bazar", et la roadmap devrait être prête d'ici la fin du mois
10:42	Sylvain	ok
10:42	Sylvain	et la 2.2.3 une date précise ? (peut être passé sur les ML mais j'ai pas fait gaffeà
10:42	Sylvain	)
10:43	paul	je vais l'annoncer pour la semaine prochaine.
10:43	paul	il reste surtout de la trad à faire, et quelques peaufinages éventuellement.
10:43	paul	(par exemple, faut que je copie tes plugins unimarc dans la 2.2
10:43	paul	)
10:45	Sylvain	c'est matthieu qui a fait ça mais ok
11:13	owen	Hi sanspach
11:33	Sylvain	can someone explain me the meaning of "datelastseen" ?
11:36	hdl	latest date when you see the book... For Inventory purpose IMHO.
11:36	Sylvain	last time it was "barcoded" ?
11:37	Sylvain	scanné à la douchette ;)
11:37	sanspach	hi owen (sorry, started IRC then walked away!)
11:38	hdl	Pas seulement, vois l'onglet inventaire/Récollement des stats ;)
11:38	hdl	English : Not Only, hav a look at Inventory/StockTaking in reports
11:38	Sylvain	"01:53 +1d chris datelastseen is the last time the item was issued, or returned, or transfered between branches"
11:39	Sylvain	hdl the stats are too powerful and have too many things, I haven't had time yet to explorate them ;)
11:39	hdl	That's why I told you about that.
11:40	hdl	And I am not the only one to have worked on that ;)
11:40	Sylvain	ok, I thought you had all done alone
11:40	hdl	So long ;)
11:47	owen	sanspach, you're in Indiana?
11:47	sanspach	yes
11:51	kados	sanspach: I tried indexing the new marc file
11:51	kados	sanspach: results are displaying weirdly
11:51	kados	similar to before
11:51	kados	http://liblime.com/zap/advanced.html
11:52	sanspach	kados: yes, I see; very odd
11:52	sanspach	almost like the directory is off and the fields are getting all mangled
11:52	kados	yea
11:53	sanspach	only this time the MARC was generated w/MARC::Record
11:53	kados	right ... so it should be valid
11:53	sanspach	how can both (very different) methods produce the same problems?
11:53	kados	well it may be the indexer
11:53	kados	but I haven't had trouble with it using other MARC records
11:54	sanspach	do you want batches of smaller sections of the db? I still have the original 54 files
11:54	kados	sure ... send em over
11:54	kados	maybe if we do them one-by-one we can catch the problem
11:55	sanspach	I'll ftp in batches of 10, in numeric order (you'll see the pattern)
11:55	kados	k ...

← Previous day | Channels | #koha index | Today | Next day → | Search | Google Search | Plain-Text | plain, newest first | summary