Time |
S |
Nick |
Message |
15:01 |
|
rach |
it's halloween? |
15:02 |
|
owen |
No, I checked. |
15:02 |
|
owen |
I haven't checked for the possibility of a mummy's curse, though. |
15:10 |
|
kados |
hehe |
15:18 |
|
rach |
:-) |
15:24 |
|
rach |
hi |
15:24 |
|
sanspach |
hello |
15:28 |
|
rach |
you guys have had a busy night :-) |
15:28 |
|
rach |
well - day for you :-) |
15:28 |
|
sanspach |
yeah; still nothing solved, though :( |
15:30 |
|
rach |
you worked out the ^m tho |
15:30 |
|
rach |
they are windows line breaks |
15:30 |
|
rach |
end of line markers |
15:30 |
|
sanspach |
yeah, but still not certain which files are affected by it |
15:31 |
|
sanspach |
and not exactly clear why some of the files that don't have them still don't work |
15:31 |
|
rach |
and if you change one record to get rid of them that doesn't help - so make a 1 record clean file? |
15:32 |
|
rach |
but I see gavin tried that |
15:33 |
|
rach |
so gavin didn't manage to get it in? |
15:34 |
|
gavin |
hi |
15:34 |
|
rach |
hi |
15:34 |
|
gavin |
the stuff was inserting for me at the end |
15:36 |
|
gavin |
i took one of sanspach's files which he emailed me (small.sample2.mrc) and substituted one delimiter for another after which it worked |
15:36 |
|
rach |
ah cool |
15:36 |
|
gavin |
then a newer file failed due to having some wierd win32 linebreaks stuck in the middle |
15:36 |
|
gavin |
no idea why they're there |
15:36 |
|
rach |
yep you'll have to take them out too, in the same sort of way |
15:36 |
|
rach |
the magic of windows :-) |
15:37 |
|
gavin |
I haven't seen kados big file so I don't know what is wrong with what he got |
15:37 |
|
sanspach |
it is probably messed up in exactly the same way |
15:37 |
|
sanspach |
it seems that MARC::Record doesn't strip the trailing ^M from the leader field when it re-writes it |
15:38 |
|
sanspach |
or maybe I messed it up; I'll have to check |
15:38 |
|
gavin |
yes if i remove the ^M out that one works too |
15:38 |
|
gavin |
they ^Ms are all over the middle of records |
15:38 |
|
gavin |
it almost looks like an editor wrapped them or something |
15:39 |
|
sanspach |
they're *all* separate lines to begin with ("flat" format) |
15:39 |
|
sanspach |
but when MARC::Record writes them out, I figured all the formatting would be fixed |
15:39 |
|
gavin |
not any of the ones i've seen |
15:39 |
|
rach |
you wish :-) |
15:40 |
|
gavin |
do you mean marc format should have linebreaks? none that I've seen have them |
15:40 |
|
sanspach |
no, no just for me |
15:40 |
|
gavin |
but i know little or nothing about marc |
15:41 |
|
sanspach |
I get the data out of our system db (Oracle, but same for mysql) as separate lines |
15:41 |
|
gavin |
i see, and you patch them up together? |
15:41 |
|
sanspach |
then I put everything back together and have MARC::Record create true marc format out of them |
15:42 |
|
gavin |
Oracle. that's an expensive library system! |
15:42 |
|
sanspach |
not for a univ. that has a site license already (!) |
15:43 |
|
sanspach |
but yes, actually, Sirsi's Unicorn product isn't the cheapest out there |
15:43 |
|
gavin |
universities are indeed wonderful places |
15:43 |
|
rach |
ah well, at least it sounds like you know how to work on the data now |
15:44 |
|
gavin |
sanspach: what do you think we need to do with kados data? |
15:45 |
|
sanspach |
rm * and start over |
15:45 |
|
gavin |
not fixable? |
15:45 |
|
sanspach |
I've lost track of what the problems might be. |
15:45 |
|
sanspach |
if it is just ^M we could strip those |
15:46 |
|
sanspach |
if it is subfield delimiters too, we could do that |
15:46 |
|
gavin |
as far as I can tell it boils down to ^M and possibly delimiter substitution which would be very quick |
15:46 |
|
gavin |
rather than go through the pain of downloading 2GB again |
15:48 |
|
sanspach |
problem is, I think the delimiter that's wrong is used elsewhere in the data, which means no global replace |
15:48 |
|
sanspach |
I think the data's got to be processed again |
15:48 |
|
gavin |
ah. |
15:49 |
|
gavin |
in that case I guess we'd better get the recreation process moving |
15:50 |
|
gavin |
would it help if we rehearsed on a small data set? |
15:50 |
|
sanspach |
definitely! |
15:51 |
|
gavin |
well if you want to give it a go and send me some stuff I'll try it out |
15:51 |
|
gavin |
then we can organise getting the 2GB batch off you |
15:52 |
|
gavin |
i have a good amount of bandwidth in my university which I can use for that |
15:55 |
|
sanspach |
OK, how should I get you the test files? I don't think putting them on my windows box and then |
15:55 |
|
sanspach |
sending them through email is good ?! |
15:56 |
|
gavin |
you were able to put it on a web server before |
15:56 |
|
gavin |
if you bzip it you, windows will just treat it as a blob and it should be safe |
15:56 |
|
sanspach |
I'll work on that |
15:56 |
|
gavin |
so whatever works |
15:58 |
|
sanspach |
OK, same place: two files--one with 2 records, one with 100 |
16:01 |
|
gavin |
those seem fine to me |
16:04 |
|
sanspach |
want to try 10K ? |
16:05 |
|
gavin |
yeah if you like. whatever size |
16:05 |
|
gavin |
but start thinking about bzipping it |
16:05 |
|
gavin |
it'll save both of us time and bandwith |
16:06 |
|
gavin |
width.. |
16:06 |
|
sanspach |
gzip? |
16:06 |
|
gavin |
yeah, that's fine either, bzip2 just gets a greater compression (although it takes more cpu time) |
16:07 |
|
gavin |
if we step up to 2gb that'll make a whale of a difference |
16:08 |
|
sanspach |
don't seem to find bzip/bzip2 so I'll have to use gzip |
16:08 |
|
gavin |
n prob |
19:23 |
|
kados |
well that's a trick ;-) |
19:24 |
|
chris |
whats that then? |
22:07 |
|
sanspach |
kados: problems? |
22:14 |
|
kados |
sanspach: you still around? |
22:14 |
|
sanspach |
yeah |
22:14 |
|
kados |
sanspach: What's the deal with the latest conversion? |
22:14 |
|
kados |
(looks like the process stopped) |
22:14 |
|
sanspach |
looks like the script stopped executing; I got disconnected a couple times, but I thought it would keep going |
22:15 |
|
sanspach |
it was only about 1/4 done |
22:15 |
|
kados |
hmmm, guess not ... |
22:15 |
|
kados |
I can start it on my end -- sound good? |
22:15 |
|
sanspach |
I removed the partial files |
22:15 |
|
sanspach |
I had it running on my machine and it has finished |
22:15 |
|
kados |
sweet |
22:15 |
|
sanspach |
I'm bzip2'ing it now |
22:15 |
|
kados |
great |
22:17 |
|
sanspach |
as soon as it is done I'll start it transferring, but then I'm going to bed |
22:17 |
|
kados |
that's cool |
22:18 |
|
kados |
shoot me an email with the size and I'll start indexing when it's finished uploading |
22:18 |
|
sanspach |
will do |
22:32 |
|
Genji |
kados: tried my search options sidebar? |
02:31 |
|
paul |
salut hdl |
02:39 |
|
hdl |
salut paul |