IRC Logs

Time  Nick     Message
12:46 kados    sanspach: hi there
12:46 sanspach hey!
12:46 kados    sanspach: it seems I'm having some problems with the data
12:46 kados    zebra is complaining when I index
12:46 kados    but I don't have any details yet
12:46 kados    I'm thinking of running a check on the records using MARC::Record
12:46 kados    later today
12:46 sanspach there may be some usmarc rules that aren't followed
12:47 sanspach I'm thinking in particular that there may be repeated 001 fields
12:47 kados    sanspach: it's strange since it gets through quite a few records before crashing
12:47 kados    hmmm, that might be a problem
12:48 sanspach our system doesn't natively store in MARC and therefore doesn't enforce usmarc rules
12:48 sanspach I knew I needed to strip out our non-standard subfields
12:48 sanspach the newer records (created on current system, ca. 4.5 yrs) would have only single 001
12:49 sanspach but records imported from old system went through a merge algorithm and ended up with multiple 001 fields
12:49 sanspach didn't think about it until after I'd sent the file
12:49 kados    interesting
12:50 kados    do you think it would be easier to modif them from the big raw file or on your end?
12:50 sanspach depends on your tools; I can't natively work with them in MARC, so I do the edits then convert
12:51 sanspach if you can work with them in MARC, it might be better for you to manipulate within the file
12:52 sanspach also, when I looked at bits of the files I noticed that fields aren't always in usmarc order--
12:52 sanspach specifically 008 seems to be in odd places (sometimes at end)
12:54 kados    I'll give it a shot and if I can't do it I'll let you know
12:55 sanspach great; I've got the data all extracted now, so it will just be a matter of re-parsing the records and converting
12:55 kados    sweet
13:04 kados    sanspach: if it's not extracted as MARC what is it extracted as?
13:04 kados    (out of curiosity)
13:04 sanspach flat ascii!   (I live and die by perl but I use activestate on win32, so Marc::... aren't available)
13:10 kados    sanspach: right
13:10 kados    sanspach: so how big is the flat ascii file?
13:10 kados    sanspach: it might actually be easier for me to import that with MARC::Record (as it will automatically enforce correct MARC syntax)
13:11 sanspach don't have one (merged them after converting to MARC) but could easily do it; in fact,
13:11 sanspach I could remove the duplicate 001's as I'm merging
13:14 kados    hmmm
13:14 kados    well it's up to you
13:14 kados    I owe you already for providing the records ;-)
13:15 kados    so a little work to tweak them isn't really a problem
13:15 kados    on the other hand, if you've got the proc time and bandwidth to do another export and ftp that'd be ok too ;-)
13:15 sanspach I'll try to figure out good compression; re-sending them in ASCII is going to be no problem at all! (MARC's the hard part)
13:15 kados    gzip is pretty good
13:15 kados    if you'd like to cut back on bandwidth
13:19 sanspach can MARC::Record read them in from LC's marcbreaker/marcmaker formate?
13:21 kados    no idea
13:36 sanspach kados: just reviewed MARC::Record docs at cpan and it looks like those tools are for MARC records
13:37 sanspach so you have a script that reads in flat files and does the creating?
13:39 kados    sortof ... I can use MARC::Record to take data in any format and feed it in to construct a valid MARC record
13:40 kados    and export as iso2709
13:40 kados    I've done this in the past for various projects
13:40 kados    like Koha's Z39.50 server
13:40 sanspach OK; marcbreaker has three idiosyncrasies:
13:40 sanspach 1) each line (=field) starts with = character
13:41 sanspach 2) next comes tag (leader has name LDR rather than 000) and two spaces
13:41 sanspach 3) next comes indicators with spaces substituted by \ character (backslash)
13:43 sanspach each line is thus /^=(LDR|\d{3})  (.*)$/
13:43 sanspach with $1 being tag and
13:44 sanspach $2 being data (where tag<10, all data, tag>9 /^(.)(.)(.*)$/ for ind1=$1,ind2=$2,field=$3)
13:46 sanspach OK, done; I've removed dup 001 (can't say for sure tag order is up to standard); text file slightly smaller
13:47 sanspach than MARC file was (makes sense--no directories)
13:47 kados    right
13:57 kados    sanspach: if you take a look at http://liblime.com/zap/advanced.html you can see the latest results I'm getting from the data
13:57 kados    sanspach: it looks like the search is working but the data coming back isn't displaying normally
13:57 kados    (choose the LibLime to see your data)
13:58 sanspach hmm
13:58 kados    (also notice that it's extremely fast ;-)
13:58 kados    (which is good news)
13:58 kados    i'd be interested in comparing it's speed and results to your current system
13:58 kados    do you have a link for that?
14:00 sanspach specs for z39.50 connection are at http://kb.iu.edu/data/ajhr.html
14:01 kados    k ... just a sec
14:03 kados    heh ... ok ... try that
14:04 kados    so the result set numbers aren't adding up
14:04 kados    interestingly
14:04 sanspach yeah, saw that from my favorite author search (durrenmatt)
14:04 sanspach looks like field and/or record boundaries are all messed up
14:04 kados    yea probably
14:05 sanspach maybe from multiple 001s?
14:05 kados    could be ... wanna send me the updated one and we'll try that?
14:05 sanspach working on compressing now
14:05 kados    cool
14:07 kados    sanspach: it'll be neat to compare Indiana's Zserver to Zap/Zebra
14:07 sanspach our server's big (/fast) but I'm not sure how optimized we are for z39.50 connections--that's never been very high priority
14:08 kados    sanspach: esp since you're prolly paying about 4-6K per year for that module
14:09 sanspach mostly 'cause we think we need it (as state inst. / major research lib. / etc.) not 'cause we actually want to support it
14:09 sanspach don't think we've got anybody knockin' down our door when it goes down!
14:09 kados    right ... still ... it'd be neat if you were able to propose cutting back on the ILS budget a bit
14:21 sanspach kados: compressed file 26% of original; ftp begun but will take ca. 40 minutes
14:22 kados    sweet
14:23 kados    let me know when it's done
14:23 kados    (FYI the indexing takes about 4 min too)
14:23 kados    s/4/40/
14:23 sanspach will do
14:24 sanspach still slays me it goes so fast
15:00 sanspach kados: ftp is done, right on schedule; let me know if there are any problems with the file or record format
15:02 kados    sanspach: sweet ... I'll get started on the indexing
15:04 kados    unzipping now
15:04 kados    tar -xzvf /home/sanspach/all.tar.gz
15:04 kados    all.txt
15:04 kados    tar: Skipping to next header
15:04 kados    tar: Archive contains obsolescent base-64 headers
15:07 sanspach working on it; google says common error; workaround should be possible...
15:09 kados    sanspach: tar: Read 3790 bytes from /home/sanspach/all.tar.gz
15:10 kados    tar: Error exit delayed from previous errors
15:10 kados    sanspach: any clue why that's happening?
15:11 sanspach 'cause I used a win32 tool to tar/gzip?!
15:11 kados    could be :-(
15:11 sanspach workaround is to unzip first then tar, but I'm seeing an error there, too; but maybe it will finish ok
15:12 sanspach ls
15:12 kados    all.txt
15:12 sanspach oops, wrong window :)
15:12 kados    hehe
15:13 kados    so all.txt is it?
15:13 sanspach not good; the all.txt file should be about same size as all.tar (ever so slightly smaller: without tar header)
15:13 sanspach way too small--it is choking partway through or something
15:14 kados    right ..
15:14 kados    look at the output from tail
15:15 kados    it's choking here:
15:15 kados    =505  1\[v. 1.] Theoretical and empiri
15:15 kados    for some reason
15:15 sanspach data is probably irrelevant; most likely bad length from header, etc.
15:16 kados    fair enough
15:24 sanspach kados: the all.tar file should be good to use if you can just strip the first few bytes
15:24 sanspach maybe read in first line and dump everything before the = that is the start of the data?
15:25 sanspach don't know what text editing tools you might have that can handle file that large; don't want to read it all into memory!
15:31 kados    grep, sed, awk, bash ;-)
15:31 kados    perl even ;-)
15:36 kados    sed 's/*=//' all.tar
15:36 kados    I'm making a backup first
15:36 kados    :-)
15:40 kados    hmmm, seems it didn't work
16:05 sanspach OK, think I've got it with perl
16:09 kados    sweet
16:09 kados    let me know when i's done uncompressing
16:12 kados    cool ... done eh?
16:12 sanspach looks like the right size...
16:12 sanspach seems right
16:12 kados    ok ... I"m gonna index it (I'll move it first)
16:12 sanspach sorry for the hassle
16:14 kados    hmmm, strange error: 14:05:08-07/06 ../../index/zebraidx(32333) [warn] records/sample-records:0 MARC record length < 25, is 0
16:14 kados    it's not indexing the file
16:15 sanspach it's flat ascii, not marc
16:15 kados    well that would explain it ;-)
16:24 kados    sanspach: so ... just so I have this straight
16:24 kados    the file is currently in MARCBreaker format
16:24 kados    you already tried using MARCMaker and it didn't produce valid MARC records
16:25 kados    so now we're going to try to use MARC::Record to create a valid MARC record
16:25 kados    sound right?
16:25 sanspach well, only sort of
16:25 sanspach I had separate small files which I converted into MARC
16:26 sanspach I'm guessing the problem was the repeated 001's
16:26 kados    using Marc Maker for the conversion (and join)
16:26 kados    right
16:27 sanspach I used MarcMaker for the conversion; I joined them afterward
16:27 kados    ok ... how big was each file (approx)
16:28 sanspach 100mgb
16:54 kados    sanspach: I'm headed home now ... I hacked together a start of a script to convert from marcmaker to usmarc using MARC::Record and I'll try to finish it up tonight
16:55 sanspach OK; if I think of anything brilliant, I'll let you know :)
16:56 kados    sanspach: sounds good ;-)
02:37 osmoze   bonjour
02:37 paul     salut js
02:37 osmoze   coucou paul
02:37 osmoze   t as deux minutes ?
02:38 paul     vas y, je t'écoute
02:39 osmoze   j ai une question : Il y a t il un moyen simple pour avoir la liste des retard dans overdue du type personne1 --> livre1,livre2,livre3 au lieu de personne1--> livre 1; personne1->livre2 etc etc
02:39 osmoze   ceci pour un mailing
02:40 paul     dans la prochaine version, on a bien la liste des ouvrages en retard.
02:40 osmoze   j avais fait un petit script php, mais j ai une erreur que je ne peux reparer :(
02:40 osmoze   alors j en viens a vos services :)
02:40 paul     c'était un manque évident.
02:40 osmoze   comment ca ?
02:40 osmoze   il y avait deja un module (overdue.pl)
02:41 paul     le overduenotice.pl a été amélioré.
02:41 paul     il envoie un mail à tous les lecteurs ayant un mail pour leur donner leur liste de retard.
02:41 paul     et il envoie un mail à la bibliothèque avec tous les lecteurs ayant des retards mais pas de mail
02:41 osmoze   le probleme est qu on ne peux pas se servir de ces données
02:42 osmoze   du mail a la bibliotheque
02:42 osmoze   car mon but etait de creer une lettre type mailing et d inclure les noms automatiquement
02:42 osmoze   cependant, ca ne marche qu avec une base ou un fichier text bien etabli
02:43 paul     exact. En fait, il faudrait que l'on mette en PJ un fichier CSV avec les infos
02:43 osmoze   tout a fait cela
02:43 osmoze   mais il y aura toujours le probleme de la redondance des noms
02:44 osmoze   pour les emprunteur qui ont plus de un livre en retard
02:46 paul     oui. On pourrait imaginer faire ca avec les titres séparés par une ,
02:46 paul     ils apparaitraient sur une seule ligne dans le mailing.
02:48 osmoze   c est exactement ce que je cherche :)
02:50 osmoze   comme cela, je peux faire un mailing rapide et efficace pour l envoi de lettre ^^
02:50 paul     avec OpenOffice ?
02:51 paul     si c'est le cas, faudra mettre ce doc dans le CVS.
02:51 osmoze   J avais tester avec word, je vais tester avec openoffice
02:51 osmoze   (les machines de l accueil son sous windows + word, mais j exclu pas de mettre openoffice-win32)
02:53 hdl_away hi.
02:53 osmoze   hello hdl
02:54 osmoze   paul, tu n aurais pas un petit fichier csv tout fait sous la main ? ^^
02:54 hdl      osmoze : c'est juste un fichier texte séparé par des points virgules. ;)
02:55 osmoze   donc c'est bon
02:55 osmoze   ca marche bien
04:55 jean     hi/bonjour
05:04 paul     mercredi, c'est la journée des enfants ET la journée de Jean sur Koha ;-)
05:04 paul     bonjour Jean. Tu vas bien ?
05:05 jean     :)
05:05 jean     oui tres bien
05:06 paul     ton doc sur l'optimisation avance bien ?
05:06 jean     je pense release aujourd'hui
05:06 paul     super !
05:06 paul     je suis impatient de le lire.
05:06 jean     mais j'ai travaille qu'un jour par semaine dessus et encore avec de multiples ralentissement
05:06 jean     c'est pour ca que ca a mi un peu de temps :)
05:07 jean     bah en tout cas je compte sur toi pour me donner ton avis
05:07 paul     tu peux y compter.
05:10 paul     bon, allez, à table. A tout à l'heure
10:29 Sylvain  hi
10:29 hdl      hi Sylvain.
10:34 Sylvain  is it envisaged to include xml in koha in any way ?
10:35 hdl      No, as far as I know. Are you interested in doing it ;) ?
10:36 Sylvain  no, just because a customer was asking ... I hadn't heard anything about it so I wondered
10:38 paul     sylvain : "include xml" is not enough.
10:39 paul     what does he want : exporting XML, importing xml, showing xml...
10:39 hdl      ... using xml ?
10:39 Sylvain  I know paul it's not enough :) But the customer is a librarian and didn't say more in its mail. So I was asking if anything was envisaged in xml
10:40 paul     zebra cause xml pas trop mal il semble...
10:40 Sylvain  mouais, rien de bien précis en tout cas concernant xml alors
10:41 paul     on est en phase "bazar", et la roadmap devrait être prête d'ici la fin du mois
10:42 Sylvain  ok
10:42 Sylvain  et la 2.2.3 une date précise ? (peut être passé sur les ML mais j'ai pas fait gaffeà
10:42 Sylvain  )
10:43 paul     je vais l'annoncer pour la semaine prochaine.
10:43 paul     il reste surtout de la trad à faire, et quelques peaufinages éventuellement.
10:43 paul     (par exemple, faut que je copie tes plugins unimarc dans la 2.2
10:43 paul     )
10:45 Sylvain  c'est matthieu qui a fait ça mais ok
11:13 owen     Hi sanspach
11:33 Sylvain  can someone explain me the meaning of "datelastseen" ?
11:36 hdl      latest date when you see the book... For Inventory purpose IMHO.
11:36 Sylvain  last time it was "barcoded" ?
11:37 Sylvain  scanné à la douchette ;)
11:37 sanspach hi owen (sorry, started IRC then walked away!)
11:38 hdl      Pas seulement, vois l'onglet inventaire/Récollement des stats ;)
11:38 hdl      English : Not Only, hav a look at Inventory/StockTaking in reports
11:38 Sylvain  "01:53 +1d	  	chris	  	datelastseen is the last time the item was issued, or returned, or transfered between branches"
11:39 Sylvain  hdl the stats are too powerful and have too many things, I haven't had time yet to explorate them ;)
11:39 hdl      That's why I told you about that.
11:40 hdl      And *I* am not the only one to have worked on that ;)
11:40 Sylvain  ok, I thought you had all done alone
11:40 hdl      So long ;)
11:47 owen     sanspach, you're in Indiana?
11:47 sanspach yes
11:51 kados    sanspach: I tried indexing the new marc file
11:51 kados    sanspach: results are displaying weirdly
11:51 kados    similar to before
11:51 kados    http://liblime.com/zap/advanced.html
11:52 sanspach kados: yes, I see; very odd
11:52 sanspach almost like the directory is off and the fields are getting all mangled
11:52 kados    yea
11:53 sanspach only this time the MARC was generated w/MARC::Record
11:53 kados    right ... so it should be valid
11:53 sanspach how can both (very different) methods produce the same problems?
11:53 kados    well it may be the indexer
11:53 kados    but I haven't had trouble with it using other MARC records
11:54 sanspach do you want batches of smaller sections of the db?  I still have the original 54 files
11:54 kados    sure ... send em over
11:54 kados    maybe if we do them one-by-one we can catch the problem
11:55 sanspach I'll ftp in batches of 10, in numeric order (you'll see the pattern)
11:55 kados    k ...