Time |
S |
Nick |
Message |
12:46 |
|
kados |
sanspach: hi there |
12:46 |
|
sanspach |
hey! |
12:46 |
|
kados |
sanspach: it seems I'm having some problems with the data |
12:46 |
|
kados |
zebra is complaining when I index |
12:46 |
|
kados |
but I don't have any details yet |
12:46 |
|
kados |
I'm thinking of running a check on the records using MARC::Record |
12:46 |
|
kados |
later today |
12:46 |
|
sanspach |
there may be some usmarc rules that aren't followed |
12:47 |
|
sanspach |
I'm thinking in particular that there may be repeated 001 fields |
12:47 |
|
kados |
sanspach: it's strange since it gets through quite a few records before crashing |
12:47 |
|
kados |
hmmm, that might be a problem |
12:48 |
|
sanspach |
our system doesn't natively store in MARC and therefore doesn't enforce usmarc rules |
12:48 |
|
sanspach |
I knew I needed to strip out our non-standard subfields |
12:48 |
|
sanspach |
the newer records (created on current system, ca. 4.5 yrs) would have only single 001 |
12:49 |
|
sanspach |
but records imported from old system went through a merge algorithm and ended up with multiple 001 fields |
12:49 |
|
sanspach |
didn't think about it until after I'd sent the file |
12:49 |
|
kados |
interesting |
12:50 |
|
kados |
do you think it would be easier to modif them from the big raw file or on your end? |
12:50 |
|
sanspach |
depends on your tools; I can't natively work with them in MARC, so I do the edits then convert |
12:51 |
|
sanspach |
if you can work with them in MARC, it might be better for you to manipulate within the file |
12:52 |
|
sanspach |
also, when I looked at bits of the files I noticed that fields aren't always in usmarc order-- |
12:52 |
|
sanspach |
specifically 008 seems to be in odd places (sometimes at end) |
12:54 |
|
kados |
I'll give it a shot and if I can't do it I'll let you know |
12:55 |
|
sanspach |
great; I've got the data all extracted now, so it will just be a matter of re-parsing the records and converting |
12:55 |
|
kados |
sweet |
13:04 |
|
kados |
sanspach: if it's not extracted as MARC what is it extracted as? |
13:04 |
|
kados |
(out of curiosity) |
13:04 |
|
sanspach |
flat ascii! (I live and die by perl but I use activestate on win32, so Marc::... aren't available) |
13:10 |
|
kados |
sanspach: right |
13:10 |
|
kados |
sanspach: so how big is the flat ascii file? |
13:10 |
|
kados |
sanspach: it might actually be easier for me to import that with MARC::Record (as it will automatically enforce correct MARC syntax) |
13:11 |
|
sanspach |
don't have one (merged them after converting to MARC) but could easily do it; in fact, |
13:11 |
|
sanspach |
I could remove the duplicate 001's as I'm merging |
13:14 |
|
kados |
hmmm |
13:14 |
|
kados |
well it's up to you |
13:14 |
|
kados |
I owe you already for providing the records ;-) |
13:15 |
|
kados |
so a little work to tweak them isn't really a problem |
13:15 |
|
kados |
on the other hand, if you've got the proc time and bandwidth to do another export and ftp that'd be ok too ;-) |
13:15 |
|
sanspach |
I'll try to figure out good compression; re-sending them in ASCII is going to be no problem at all! (MARC's the hard part) |
13:15 |
|
kados |
gzip is pretty good |
13:15 |
|
kados |
if you'd like to cut back on bandwidth |
13:19 |
|
sanspach |
can MARC::Record read them in from LC's marcbreaker/marcmaker formate? |
13:21 |
|
kados |
no idea |
13:36 |
|
sanspach |
kados: just reviewed MARC::Record docs at cpan and it looks like those tools are for MARC records |
13:37 |
|
sanspach |
so you have a script that reads in flat files and does the creating? |
13:39 |
|
kados |
sortof ... I can use MARC::Record to take data in any format and feed it in to construct a valid MARC record |
13:40 |
|
kados |
and export as iso2709 |
13:40 |
|
kados |
I've done this in the past for various projects |
13:40 |
|
kados |
like Koha's Z39.50 server |
13:40 |
|
sanspach |
OK; marcbreaker has three idiosyncrasies: |
13:40 |
|
sanspach |
1) each line (=field) starts with = character |
13:41 |
|
sanspach |
2) next comes tag (leader has name LDR rather than 000) and two spaces |
13:41 |
|
sanspach |
3) next comes indicators with spaces substituted by \ character (backslash) |
13:43 |
|
sanspach |
each line is thus /^=(LDR|\d{3}) (.*)$/ |
13:43 |
|
sanspach |
with $1 being tag and |
13:44 |
|
sanspach |
$2 being data (where tag<10, all data, tag>9 /^(.)(.)(.*)$/ for ind1=$1,ind2=$2,field=$3) |
13:46 |
|
sanspach |
OK, done; I've removed dup 001 (can't say for sure tag order is up to standard); text file slightly smaller |
13:47 |
|
sanspach |
than MARC file was (makes sense--no directories) |
13:47 |
|
kados |
right |
13:57 |
|
kados |
sanspach: if you take a look at http://liblime.com/zap/advanced.html you can see the latest results I'm getting from the data |
13:57 |
|
kados |
sanspach: it looks like the search is working but the data coming back isn't displaying normally |
13:57 |
|
kados |
(choose the LibLime to see your data) |
13:58 |
|
sanspach |
hmm |
13:58 |
|
kados |
(also notice that it's extremely fast ;-) |
13:58 |
|
kados |
(which is good news) |
13:58 |
|
kados |
i'd be interested in comparing it's speed and results to your current system |
13:58 |
|
kados |
do you have a link for that? |
14:00 |
|
sanspach |
specs for z39.50 connection are at http://kb.iu.edu/data/ajhr.html |
14:01 |
|
kados |
k ... just a sec |
14:03 |
|
kados |
heh ... ok ... try that |
14:04 |
|
kados |
so the result set numbers aren't adding up |
14:04 |
|
kados |
interestingly |
14:04 |
|
sanspach |
yeah, saw that from my favorite author search (durrenmatt) |
14:04 |
|
sanspach |
looks like field and/or record boundaries are all messed up |
14:04 |
|
kados |
yea probably |
14:05 |
|
sanspach |
maybe from multiple 001s? |
14:05 |
|
kados |
could be ... wanna send me the updated one and we'll try that? |
14:05 |
|
sanspach |
working on compressing now |
14:05 |
|
kados |
cool |
14:07 |
|
kados |
sanspach: it'll be neat to compare Indiana's Zserver to Zap/Zebra |
14:07 |
|
sanspach |
our server's big (/fast) but I'm not sure how optimized we are for z39.50 connections--that's never been very high priority |
14:08 |
|
kados |
sanspach: esp since you're prolly paying about 4-6K per year for that module |
14:09 |
|
sanspach |
mostly 'cause we think we need it (as state inst. / major research lib. / etc.) not 'cause we actually want to support it |
14:09 |
|
sanspach |
don't think we've got anybody knockin' down our door when it goes down! |
14:09 |
|
kados |
right ... still ... it'd be neat if you were able to propose cutting back on the ILS budget a bit |
14:21 |
|
sanspach |
kados: compressed file 26% of original; ftp begun but will take ca. 40 minutes |
14:22 |
|
kados |
sweet |
14:23 |
|
kados |
let me know when it's done |
14:23 |
|
kados |
(FYI the indexing takes about 4 min too) |
14:23 |
|
kados |
s/4/40/ |
14:23 |
|
sanspach |
will do |
14:24 |
|
sanspach |
still slays me it goes so fast |
15:00 |
|
sanspach |
kados: ftp is done, right on schedule; let me know if there are any problems with the file or record format |
15:02 |
|
kados |
sanspach: sweet ... I'll get started on the indexing |
15:04 |
|
kados |
unzipping now |
15:04 |
|
kados |
tar -xzvf /home/sanspach/all.tar.gz |
15:04 |
|
kados |
all.txt |
15:04 |
|
kados |
tar: Skipping to next header |
15:04 |
|
kados |
tar: Archive contains obsolescent base-64 headers |
15:07 |
|
sanspach |
working on it; google says common error; workaround should be possible... |
15:09 |
|
kados |
sanspach: tar: Read 3790 bytes from /home/sanspach/all.tar.gz |
15:10 |
|
kados |
tar: Error exit delayed from previous errors |
15:10 |
|
kados |
sanspach: any clue why that's happening? |
15:11 |
|
sanspach |
'cause I used a win32 tool to tar/gzip?! |
15:11 |
|
kados |
could be :-( |
15:11 |
|
sanspach |
workaround is to unzip first then tar, but I'm seeing an error there, too; but maybe it will finish ok |
15:12 |
|
sanspach |
ls |
15:12 |
|
kados |
all.txt |
15:12 |
|
sanspach |
oops, wrong window :) |
15:12 |
|
kados |
hehe |
15:13 |
|
kados |
so all.txt is it? |
15:13 |
|
sanspach |
not good; the all.txt file should be about same size as all.tar (ever so slightly smaller: without tar header) |
15:13 |
|
sanspach |
way too small--it is choking partway through or something |
15:14 |
|
kados |
right .. |
15:14 |
|
kados |
look at the output from tail |
15:15 |
|
kados |
it's choking here: |
15:15 |
|
kados |
=505 1\[v. 1.] Theoretical and empiri |
15:15 |
|
kados |
for some reason |
15:15 |
|
sanspach |
data is probably irrelevant; most likely bad length from header, etc. |
15:16 |
|
kados |
fair enough |
15:24 |
|
sanspach |
kados: the all.tar file should be good to use if you can just strip the first few bytes |
15:24 |
|
sanspach |
maybe read in first line and dump everything before the = that is the start of the data? |
15:25 |
|
sanspach |
don't know what text editing tools you might have that can handle file that large; don't want to read it all into memory! |
15:31 |
|
kados |
grep, sed, awk, bash ;-) |
15:31 |
|
kados |
perl even ;-) |
15:36 |
|
kados |
sed 's/*=//' all.tar |
15:36 |
|
kados |
I'm making a backup first |
15:36 |
|
kados |
:-) |
15:40 |
|
kados |
hmmm, seems it didn't work |
16:05 |
|
sanspach |
OK, think I've got it with perl |
16:09 |
|
kados |
sweet |
16:09 |
|
kados |
let me know when i's done uncompressing |
16:12 |
|
kados |
cool ... done eh? |
16:12 |
|
sanspach |
looks like the right size... |
16:12 |
|
sanspach |
seems right |
16:12 |
|
kados |
ok ... I"m gonna index it (I'll move it first) |
16:12 |
|
sanspach |
sorry for the hassle |
16:14 |
|
kados |
hmmm, strange error: 14:05:08-07/06 ../../index/zebraidx(32333) [warn] records/sample-records:0 MARC record length < 25, is 0 |
16:14 |
|
kados |
it's not indexing the file |
16:15 |
|
sanspach |
it's flat ascii, not marc |
16:15 |
|
kados |
well that would explain it ;-) |
16:24 |
|
kados |
sanspach: so ... just so I have this straight |
16:24 |
|
kados |
the file is currently in MARCBreaker format |
16:24 |
|
kados |
you already tried using MARCMaker and it didn't produce valid MARC records |
16:25 |
|
kados |
so now we're going to try to use MARC::Record to create a valid MARC record |
16:25 |
|
kados |
sound right? |
16:25 |
|
sanspach |
well, only sort of |
16:25 |
|
sanspach |
I had separate small files which I converted into MARC |
16:26 |
|
sanspach |
I'm guessing the problem was the repeated 001's |
16:26 |
|
kados |
using Marc Maker for the conversion (and join) |
16:26 |
|
kados |
right |
16:27 |
|
sanspach |
I used MarcMaker for the conversion; I joined them afterward |
16:27 |
|
kados |
ok ... how big was each file (approx) |
16:28 |
|
sanspach |
100mgb |
16:54 |
|
kados |
sanspach: I'm headed home now ... I hacked together a start of a script to convert from marcmaker to usmarc using MARC::Record and I'll try to finish it up tonight |
16:55 |
|
sanspach |
OK; if I think of anything brilliant, I'll let you know :) |
16:56 |
|
kados |
sanspach: sounds good ;-) |
02:37 |
|
osmoze |
bonjour |
02:37 |
|
paul |
salut js |
02:37 |
|
osmoze |
coucou paul |
02:37 |
|
osmoze |
t as deux minutes ? |
02:38 |
|
paul |
vas y, je t'écoute |
02:39 |
|
osmoze |
j ai une question : Il y a t il un moyen simple pour avoir la liste des retard dans overdue du type personne1 --> livre1,livre2,livre3 au lieu de personne1--> livre 1; personne1->livre2 etc etc |
02:39 |
|
osmoze |
ceci pour un mailing |
02:40 |
|
paul |
dans la prochaine version, on a bien la liste des ouvrages en retard. |
02:40 |
|
osmoze |
j avais fait un petit script php, mais j ai une erreur que je ne peux reparer :( |
02:40 |
|
osmoze |
alors j en viens a vos services :) |
02:40 |
|
paul |
c'était un manque évident. |
02:40 |
|
osmoze |
comment ca ? |
02:40 |
|
osmoze |
il y avait deja un module (overdue.pl) |
02:41 |
|
paul |
le overduenotice.pl a été amélioré. |
02:41 |
|
paul |
il envoie un mail à tous les lecteurs ayant un mail pour leur donner leur liste de retard. |
02:41 |
|
paul |
et il envoie un mail à la bibliothèque avec tous les lecteurs ayant des retards mais pas de mail |
02:41 |
|
osmoze |
le probleme est qu on ne peux pas se servir de ces données |
02:42 |
|
osmoze |
du mail a la bibliotheque |
02:42 |
|
osmoze |
car mon but etait de creer une lettre type mailing et d inclure les noms automatiquement |
02:42 |
|
osmoze |
cependant, ca ne marche qu avec une base ou un fichier text bien etabli |
02:43 |
|
paul |
exact. En fait, il faudrait que l'on mette en PJ un fichier CSV avec les infos |
02:43 |
|
osmoze |
tout a fait cela |
02:43 |
|
osmoze |
mais il y aura toujours le probleme de la redondance des noms |
02:44 |
|
osmoze |
pour les emprunteur qui ont plus de un livre en retard |
02:46 |
|
paul |
oui. On pourrait imaginer faire ca avec les titres séparés par une , |
02:46 |
|
paul |
ils apparaitraient sur une seule ligne dans le mailing. |
02:48 |
|
osmoze |
c est exactement ce que je cherche :) |
02:50 |
|
osmoze |
comme cela, je peux faire un mailing rapide et efficace pour l envoi de lettre ^^ |
02:50 |
|
paul |
avec OpenOffice ? |
02:51 |
|
paul |
si c'est le cas, faudra mettre ce doc dans le CVS. |
02:51 |
|
osmoze |
J avais tester avec word, je vais tester avec openoffice |
02:51 |
|
osmoze |
(les machines de l accueil son sous windows + word, mais j exclu pas de mettre openoffice-win32) |
02:53 |
|
hdl_away |
hi. |
02:53 |
|
osmoze |
hello hdl |
02:54 |
|
osmoze |
paul, tu n aurais pas un petit fichier csv tout fait sous la main ? ^^ |
02:54 |
|
hdl |
osmoze : c'est juste un fichier texte séparé par des points virgules. ;) |
02:55 |
|
osmoze |
donc c'est bon |
02:55 |
|
osmoze |
ca marche bien |
04:55 |
|
jean |
hi/bonjour |
05:04 |
|
paul |
mercredi, c'est la journée des enfants ET la journée de Jean sur Koha ;-) |
05:04 |
|
paul |
bonjour Jean. Tu vas bien ? |
05:05 |
|
jean |
:) |
05:05 |
|
jean |
oui tres bien |
05:06 |
|
paul |
ton doc sur l'optimisation avance bien ? |
05:06 |
|
jean |
je pense release aujourd'hui |
05:06 |
|
paul |
super ! |
05:06 |
|
paul |
je suis impatient de le lire. |
05:06 |
|
jean |
mais j'ai travaille qu'un jour par semaine dessus et encore avec de multiples ralentissement |
05:06 |
|
jean |
c'est pour ca que ca a mi un peu de temps :) |
05:07 |
|
jean |
bah en tout cas je compte sur toi pour me donner ton avis |
05:07 |
|
paul |
tu peux y compter. |
05:10 |
|
paul |
bon, allez, à table. A tout à l'heure |
10:29 |
|
Sylvain |
hi |
10:29 |
|
hdl |
hi Sylvain. |
10:34 |
|
Sylvain |
is it envisaged to include xml in koha in any way ? |
10:35 |
|
hdl |
No, as far as I know. Are you interested in doing it ;) ? |
10:36 |
|
Sylvain |
no, just because a customer was asking ... I hadn't heard anything about it so I wondered |
10:38 |
|
paul |
sylvain : "include xml" is not enough. |
10:39 |
|
paul |
what does he want : exporting XML, importing xml, showing xml... |
10:39 |
|
hdl |
... using xml ? |
10:39 |
|
Sylvain |
I know paul it's not enough :) But the customer is a librarian and didn't say more in its mail. So I was asking if anything was envisaged in xml |
10:40 |
|
paul |
zebra cause xml pas trop mal il semble... |
10:40 |
|
Sylvain |
mouais, rien de bien précis en tout cas concernant xml alors |
10:41 |
|
paul |
on est en phase "bazar", et la roadmap devrait être prête d'ici la fin du mois |
10:42 |
|
Sylvain |
ok |
10:42 |
|
Sylvain |
et la 2.2.3 une date précise ? (peut être passé sur les ML mais j'ai pas fait gaffeà |
10:42 |
|
Sylvain |
) |
10:43 |
|
paul |
je vais l'annoncer pour la semaine prochaine. |
10:43 |
|
paul |
il reste surtout de la trad à faire, et quelques peaufinages éventuellement. |
10:43 |
|
paul |
(par exemple, faut que je copie tes plugins unimarc dans la 2.2 |
10:43 |
|
paul |
) |
10:45 |
|
Sylvain |
c'est matthieu qui a fait ça mais ok |
11:13 |
|
owen |
Hi sanspach |
11:33 |
|
Sylvain |
can someone explain me the meaning of "datelastseen" ? |
11:36 |
|
hdl |
latest date when you see the book... For Inventory purpose IMHO. |
11:36 |
|
Sylvain |
last time it was "barcoded" ? |
11:37 |
|
Sylvain |
scanné à la douchette ;) |
11:37 |
|
sanspach |
hi owen (sorry, started IRC then walked away!) |
11:38 |
|
hdl |
Pas seulement, vois l'onglet inventaire/Récollement des stats ;) |
11:38 |
|
hdl |
English : Not Only, hav a look at Inventory/StockTaking in reports |
11:38 |
|
Sylvain |
"01:53 +1d chris datelastseen is the last time the item was issued, or returned, or transfered between branches" |
11:39 |
|
Sylvain |
hdl the stats are too powerful and have too many things, I haven't had time yet to explorate them ;) |
11:39 |
|
hdl |
That's why I told you about that. |
11:40 |
|
hdl |
And *I* am not the only one to have worked on that ;) |
11:40 |
|
Sylvain |
ok, I thought you had all done alone |
11:40 |
|
hdl |
So long ;) |
11:47 |
|
owen |
sanspach, you're in Indiana? |
11:47 |
|
sanspach |
yes |
11:51 |
|
kados |
sanspach: I tried indexing the new marc file |
11:51 |
|
kados |
sanspach: results are displaying weirdly |
11:51 |
|
kados |
similar to before |
11:51 |
|
kados |
http://liblime.com/zap/advanced.html |
11:52 |
|
sanspach |
kados: yes, I see; very odd |
11:52 |
|
sanspach |
almost like the directory is off and the fields are getting all mangled |
11:52 |
|
kados |
yea |
11:53 |
|
sanspach |
only this time the MARC was generated w/MARC::Record |
11:53 |
|
kados |
right ... so it should be valid |
11:53 |
|
sanspach |
how can both (very different) methods produce the same problems? |
11:53 |
|
kados |
well it may be the indexer |
11:53 |
|
kados |
but I haven't had trouble with it using other MARC records |
11:54 |
|
sanspach |
do you want batches of smaller sections of the db? I still have the original 54 files |
11:54 |
|
kados |
sure ... send em over |
11:54 |
|
kados |
maybe if we do them one-by-one we can catch the problem |
11:55 |
|
sanspach |
I'll ftp in batches of 10, in numeric order (you'll see the pattern) |
11:55 |
|
kados |
k ... |