Time |
S |
Nick |
Message |
12:51 |
|
thd |
kados: I just sent you the bzipped file. Sending had failed to work this morning because the gzipped file had exceeded my quota for attachment size. |
12:51 |
|
kados |
thd: thanks |
12:51 |
|
kados |
thd: I'll take a look asap |
12:53 |
|
thd |
kados: the useful file content is not very large but the HTML log files contained even unused records :) |
12:55 |
|
kados |
thd: very useful explaination |
12:55 |
|
kados |
thd: in the email |
12:55 |
|
kados |
thd: thanks |
12:56 |
|
thd |
kados: I try to avoid being incomplete in explanation :) |
14:27 |
|
kados |
owen: have you worked at all on the barcodes stuff? |
14:28 |
|
kados |
owen: if so, could you commit what you've got? |
14:28 |
|
kados |
owen: I was gonna work on it a bit but wanted to avoid merge conflicts |
14:29 |
|
owen |
I /did/ work on it. Then I overwrote it with a CVS update. :( |
14:30 |
|
kados |
hehe |
14:30 |
|
owen |
It was just cleanup, though. Easy to do again. *sigh* |
14:30 |
|
owen |
Say, while you're here... |
14:31 |
|
owen |
Is there any reason why the AmazonContent system pref shouldn't apply to OPAC and Intranet both? |
14:31 |
|
kados |
owen: no, just haven't merged that stuff again |
14:31 |
|
kados |
owen: I'll hack on the barcodes stuff |
14:31 |
|
kados |
owen: get it working first |
14:32 |
|
kados |
owen: then you can beautify it :-) |
14:32 |
|
owen |
Right now it's inside "unless ($in->{'type'} eq "intranet") {" |
14:32 |
|
owen |
Can I just move the 'AmazonContent => C4::Context->preference("AmazonContent"),' outside of that unless? |
14:34 |
|
owen |
Hmmm... looks like I can. |
14:54 |
|
owen |
kados: why doesn't labels-home.pl belong in /barcodes ? |
15:06 |
|
kados |
owen: all the -home stuff should go in the root dir |
15:06 |
|
kados |
owen: ok, cvs is updated as far as placement and links go |
15:07 |
|
kados |
owen: it's all yours |
15:07 |
|
kados |
owen: I've send a mail to mason with a list of remaining tasks |
15:07 |
|
kados |
owen: having to do with db updating, installer, etc. |
15:08 |
|
kados |
owen: re: amacon change, sure |
15:08 |
|
kados |
owen: not sure why it was wrapped in that to begin with |
16:04 |
|
osmoze |
hello |
16:39 |
|
slef |
hello osmoze |
16:40 |
|
slef |
Are the .nz working today or is it a holiday there now? |
16:57 |
|
kados |
i think it's holiday until tuesday |
16:57 |
|
kados |
monday for us |
17:20 |
|
chris |
yep its a holiday, im around for the next 20 mins or so tho if you need something slef? |
17:39 |
|
slef |
chris: mostly curious. Nothing in partic unless you can make si /mode #koha -t ;-) |
17:39 |
|
slef |
kados: what? The US doesn't celebrate Easter Monday? |
17:40 |
|
kados |
slef: no idea, I don't do holidays :-) |
17:41 |
|
chris |
ahh, ill ask next time i see him slef |
17:41 |
|
chris |
easter and christmas nz pretty much shuts down ... you have to get permission to trade |
17:42 |
|
kados |
heh |
17:42 |
|
chris |
and you have to pay your staff double time |
17:42 |
|
chris |
i think its double time .. might be time and a half |
17:42 |
|
chris |
so only gas stations, movie theatres and restaurants generally |
17:42 |
|
chris |
oh and there is a rogue garden store chain |
17:42 |
|
chris |
that opens and gets fined every year |
17:43 |
|
kados |
hehe |
17:44 |
|
chris |
then we have anzac day next tuesday .. lots of public holidays in april |
17:44 |
|
chris |
(the 25th) |
17:44 |
|
slef |
chris: it's like Eng 20 years ago... |
17:45 |
|
chris |
:-) |
17:45 |
|
chris |
20 years ago .. shops didnt open on sundays |
17:45 |
|
slef |
although in the village I grew up, it closed pretty much every weekend |
17:45 |
|
chris |
and there was no advertising on tv on sundays |
17:45 |
|
chris |
in nz |
17:46 |
|
chris |
i forget when 6 oclock closing ended |
17:47 |
|
chris |
1967 |
17:47 |
|
chris |
it got changed to 10 oclock closing |
17:47 |
|
chris |
i dont think there is a set time anymore, there are bars that seem to never be closed :) |
17:48 |
|
slef |
we've only just got deregulated opening times for bars |
17:48 |
|
chris |
does mail come on sunday in the US joshua? |
17:48 |
|
slef |
shops have been 24 hours for a while |
17:48 |
|
chris |
ahh yes i remember seeing that on the news slef |
17:48 |
|
slef |
s/deregulated/derestricted/ |
17:48 |
|
chris |
right |
17:48 |
|
slef |
still the police object to pretty much all applications for longer hours |
17:49 |
|
slef |
as far as I can tell |
17:49 |
|
slef |
given most the bars in this town, that's probably correct, though :-/ |
17:49 |
|
chris |
:) |
17:49 |
|
chris |
whats the drinking age 20? |
17:49 |
|
slef |
England has binge drinking trouble as we adapt to the idea that alcohol is always available |
17:49 |
|
slef |
18 |
17:50 |
|
chris |
ahh same as here then |
17:50 |
|
chris |
yeah, binge drinking is still a problem in nz |
17:51 |
|
chris |
mainly young kids |
17:52 |
|
chris |
so they are proposing to raise the drinking age again .. which i suspect wont do a damn thing except hide the problem again |
17:52 |
|
slef |
and you can legally drink younger than that in some situations which I forget |
17:52 |
|
slef |
personally, I think if you're old enough to get it, you're old enough to drink it... sooner you start, sooner you learn when to stop |
17:52 |
|
slef |
as ever, some people are slower learners than others ;-) |
17:52 |
|
slef |
I just hope the English figure out drinking before we drag the rest of Europe down with us |
17:52 |
|
slef |
I think only the Nordics have a worse reputation... and they had even tighter alcohol restrictions! |
17:52 |
|
chris |
IMO there isnt actually anymore drinking going on by 18,19 year olds, its just they can do it in the open now |
17:52 |
|
slef |
for sure |
17:53 |
|
chris |
i forget where in the UK are u slef? |
17:53 |
|
chris |
i have friends who I went to uni with, who are pharmacists in bristol |
18:05 |
|
chris |
hi kyle |
18:05 |
|
chris |
nice work on the template, very snazzy |
18:05 |
|
kados |
hey kyle |
18:09 |
|
slef |
Someone buy .nz a new link to the world, please. |
18:10 |
|
slef |
at least to .uk |
18:10 |
|
slef |
<slef> http://mjr.towers.org.uk/blog/2006/cyclynn has some pictures of just |
18:10 |
|
slef |
outside town |
18:10 |
|
slef |
<slef> yes, it is really that flat |
18:10 |
|
slef |
<slef> no, it's not all like that here (despite what some morons tell you) |
18:10 |
|
slef |
<slef> http://mjr.towers.org.uk/photos/lynn/lynnwide is the town, as seen from |
18:10 |
|
slef |
across the river |
18:10 |
|
slef |
<chris> i have friends who I went to uni with, who are pharmacists in bristol |
18:10 |
|
slef |
*** The server says: ERROR :Closing Link: 50-30-55-213.adsl.realpoptel.net |
18:10 |
|
chris |
ahh cool |
18:11 |
|
slef |
what did I miss? |
18:11 |
|
chris |
nothing really |
18:11 |
|
chris |
i just congratulated kyle on the template he committed |
18:11 |
|
chris |
and kados said hi, thats about it |
18:12 |
|
slef |
Bristol is about 4hrs train or full-speed driving away |
18:12 |
|
slef |
but I'll probably move there soon |
18:12 |
|
slef |
well, near |
18:13 |
|
chris |
ahh ok |
18:14 |
|
slef |
btw, if you mouseover bits of the image, the titles tell you what's what |
18:14 |
|
slef |
on lynnwide that is |
18:14 |
|
chris |
oohh tricky |
18:15 |
|
chris |
thats cool |
18:15 |
|
slef |
I can't identify everything |
18:15 |
|
slef |
there are buildings you see really clearly from Ferry Square that I don't usually notice |
18:15 |
|
chris |
http://photos.bigballofwax.co.[…]llington.jpg.html |
18:16 |
|
chris |
how many people live there? |
18:16 |
|
slef |
10k in the main town, 40k inside the bypass |
18:17 |
|
chris |
right, so quite small in the scheme of things |
18:17 |
|
slef |
(there are four or so villages inside the bypass, slowing running into the town) |
18:17 |
|
chris |
ahh right |
18:17 |
|
slef |
yes and no |
18:18 |
|
slef |
I think it's a very small town, but it's the biggest place for 35 miles by land |
18:18 |
|
chris |
right, its reasonably sized compared to a lot of places in nz too |
18:19 |
|
slef |
most of the stuff until those places to the south and west looks like the pictures from the bridge |
18:19 |
|
slef |
so it has more shops and stuff than a town of 10k usually would |
18:19 |
|
Genji |
hello all. |
18:19 |
|
slef |
and more factories... it's just generally a bit wrong |
18:20 |
|
slef |
hello Genji |
18:21 |
|
slef |
how many in Wellington? |
18:21 |
|
chris |
hmm good question |
18:21 |
|
chris |
3 are 4 cities .. wellington itself, porirua, hutt city, and upper hutt ... i think combined its around 600k |
18:22 |
|
chris |
im not sure how many in wellington proper |
18:22 |
|
slef |
look of it reminds me of Toronto somehow |
18:22 |
|
slef |
apart from the wooded hill, which reminds me of Worlebury (small, don't ask) |
18:22 |
|
chris |
heh |
18:22 |
|
Genji |
okay.. im going to be doing chores but can someone please inform me of the current bugs and feature requests on Koha? want to get back into it. |
18:24 |
|
chris |
ah ha |
18:24 |
|
chris |
2001 census 163k in wellington city |
18:24 |
|
chris |
423k in wellington region |
18:25 |
|
slef |
is the region much bigger? |
18:25 |
|
chris |
hmmm, kados might be the best one to answer that genji |
18:26 |
|
chris |
well it includes those other 3 cities .. but from the city of the centre to the outlying cities, furthest one would be 30 mins in a car |
18:26 |
|
chris |
centre of the city even :) |
18:27 |
|
Genji |
when is kados most active? |
18:27 |
|
chris |
http://maps.google.com/maps?f=[…]0.282259,0.839767 |
18:28 |
|
chris |
wellington city is at teh southern end of the harbour |
18:28 |
|
chris |
across the harbour to the northeast is huttcity, and upper hutt |
18:28 |
|
slef |
http://tx.mb21.co.uk/gallery/worlebury-hill.asp but it only has a picture of Crook Peak on the other side of the valley (picture with the Hutton label) |
18:28 |
|
chris |
and north is porirua |
18:29 |
|
slef |
heh, maps.google.com is a night view here |
18:29 |
|
slef |
biiig black rectangle |
18:29 |
|
chris |
heh |
18:29 |
|
chris |
thats not that helpful :) |
18:29 |
|
slef |
http://uk2.multimap.com/client[…]000&scale=4000000 |
18:30 |
|
chris |
genji: might be a good idea to drop a mail to koha-devel that way lots of eyes willl see it |
18:30 |
|
slef |
next large place is peterborough, 33 miles WSW |
18:30 |
|
chris |
ahh |
18:31 |
|
slef |
next political capital upwards is Norwich, 45 miles E |
18:31 |
|
chris |
right, i have a much better idea now |
18:31 |
|
slef |
forgot I had that map bookmarked |
18:31 |
|
chris |
im not sure why, but i had it my head you were way more to the west |
18:32 |
|
Genji |
we dont have a bugzilla anymore? |
18:32 |
|
chris |
yep we do |
18:32 |
|
chris |
bugs.koha.org |
18:32 |
|
slef |
.uk sites don't like concave coastlines - I search for cinema showings and get results for "Skegness, 22 miles North" |
18:32 |
|
chris |
anything there you are welcome to have a go at |
18:32 |
|
slef |
(~70 miles by road) |
18:32 |
|
chris |
heh |
18:33 |
|
slef |
chris: I'm from near Northampton and spend some time near Bristol. |
18:33 |
|
slef |
went to uni in Norwich and got distracted on the way back |
18:34 |
|
chris |
hehe |
18:34 |
|
slef |
that was, what, 12 years ago now |
18:38 |
|
Genji |
what libs use NPL templates? |
18:38 |
|
chris |
ok im gonna go out for a bit and enjoy some sun |
18:39 |
|
chris |
probably be back later |
18:39 |
|
chris |
NPL do genji .. and i think liblime's clients use variants of it but im not sure |
19:04 |
|
Genji |
how is it that all bugs are assigned to people, even though its status is NEW? |
19:14 |
|
slef |
bugzilla assigns them automatically |
19:14 |
|
slef |
doesn't mean much unless they accept them |
19:22 |
|
kados |
thd: you around? |
19:24 |
|
thd |
kados: yes I am sending you some media type code |
19:24 |
|
kados |
thd: the script's 90% done |
19:24 |
|
kados |
thd: already got the media type codes |
19:24 |
|
kados |
thd: http://www.itsmarc.com/crs/Bib0443.htm |
19:24 |
|
kados |
thd: unless they go above and beyone that list |
19:24 |
|
kados |
thd: the script works like a charm |
19:25 |
|
kados |
thd: just have to add a few more things and I'll be done |
19:25 |
|
kados |
thd: but I can actually insert the files as they are so you can look at them |
19:27 |
|
thd |
kados: I have more media type code than you could determine from that page. |
19:27 |
|
thd |
kados: you may be missing something for books. |
19:28 |
|
thd |
kados: I will send my code in two minutes. |
19:28 |
|
kados |
thd: do they all rely on the leader? |
19:29 |
|
thd |
kados: my code starts with the leader but you have to use both leader positions 06 and 07. |
19:30 |
|
thd |
kados: media type quickly becomes complex after leader but leader code is sufficient for what we need today. |
19:30 |
|
kados |
k |
19:31 |
|
kados |
thd: leader position6 only has two values, right? |
19:32 |
|
thd |
kados: more than two |
19:38 |
|
thd |
kados: you should have the message now |
19:41 |
|
Genji |
kados: media type codes? like my idea i implemented last year? |
19:42 |
|
thd |
Genji: what was your idea? |
19:43 |
|
Genji |
hmm... checking if its still in the cvs... |
19:44 |
|
Genji |
the root of koha module has been trimmed extremely.... |
19:45 |
|
Genji |
okay... |
19:46 |
|
Genji |
ah... maybe mediatype is different.. but my implementation, which is in cvs, is mediatype -> itemtype-> itemsubtype |
19:47 |
|
thd |
kados: obviously parts of my code are missing something as I stopped part way through but those issues can be easily fixed. Some variables certainly need a larger scope or some different treatment. |
19:48 |
|
thd |
Genji: how did you determine media type in your code? |
19:50 |
|
thd |
Genji: kados and I were discussing reading some media type information from the leader, etc. for copy catalogued records. |
19:50 |
|
Genji |
new tables... media type table, itemtype table, itemsubtype table... one linked to the other. for instance... CNFB means Children nonfiction books... ahh.. copying cataloged records.. right. haven't got that far. |
19:52 |
|
kados |
thd: remember that error you were having when attempting to import record #11? |
19:53 |
|
kados |
thd: these records have some major problems with encoding |
19:53 |
|
kados |
thd: mostlikely because of the way they were saved |
19:53 |
|
kados |
thd: did you write binmode utf8 when you wrote them to file? |
19:53 |
|
kados |
thd: (did php or perl write them to file?) |
19:54 |
|
kados |
thd: (and were they downloaded in raw format?) |
19:54 |
|
thd |
Genji: the media type is a very amorphous term. MARC generally uses media types contained in standard cataloguing rules but spreads the information all over the record while the leader contains the most important information in positions 06 and 07. |
19:54 |
|
kados |
the script i wrote dies with: |
19:54 |
|
kados |
utf8 "\xEC" does not map to Unicode at /usr/lib/perl/5.8/Encode.pm line 164, <INFILE> line 171. |
19:54 |
|
kados |
on record #11 |
19:55 |
|
thd |
kados: they were raw and I mostly had no problems but I did nothing to encode the raw data. The raw data was the raw data except for display. |
19:56 |
|
thd |
kados: Is record 11 Unicode and not MARC 8? |
19:57 |
|
kados |
no |
19:57 |
|
kados |
NUMBER 10 => |
19:57 |
|
kados |
LDR 00450cam 2200157 4500 |
19:57 |
|
kados |
NUMBER 11 => |
19:57 |
|
kados |
LDR 01467cam 2200361 i 4500 |
19:57 |
|
kados |
at least it claims to be marc-8 |
19:57 |
|
thd |
kados: look at 11.html . Do you see anything wrong? |
19:57 |
|
kados |
however, it's perl that's complaining in this case |
19:58 |
|
kados |
thd: no, not sure what that would tell me anyway |
19:58 |
|
kados |
thd: it's the encoding of a char that's the prob |
19:59 |
|
thd |
kados: why does Perl want to complain about characters that should all be ASCII. |
19:59 |
|
thd |
? |
20:00 |
|
thd |
kados: what is the offending character? |
20:00 |
|
Genji |
offline for chores |
20:01 |
|
thd |
kados: These records should not have encoding issues except for maybe some native Alaskan language characters which I had not noticed in any records. |
20:03 |
|
thd |
kados: does YAZ itself have some encoding bugs? |
20:12 |
|
thd |
kados: PHP wrote to the raw records but did not alter their content except for later in the code where htmlspecialchars() is used for encoding to post the record in a form for manual record saving only after LWP directed automated saving has already happened. The htmlspecialchars() encoding is removed at the time of manual saving after parsing the post information. |
20:13 |
|
thd |
kados: I specifically avoided passing the raw record over LWP to avoid any possible encoding problems. |
20:15 |
|
thd |
kados: raw is as raw as YAZ provided. If there are encoding problems those existed in the original record. Would Columbia University Library really inflict encoding problems on an unsuspecting world? :) |
20:27 |
|
kados |
thd: i found a workaround finally |
20:27 |
|
kados |
thd: 479 records in the file? |
20:28 |
|
thd |
kados: what was the problem? |
20:28 |
|
kados |
thd: bad encoding |
20:29 |
|
thd |
kados: where was the bad encoding? |
20:30 |
|
kados |
thd: in several of the records |
20:30 |
|
kados |
thd: it's impossible to know where |
20:30 |
|
thd |
kados: I think there should be 456 records in the file. |
20:30 |
|
kados |
I have 479 :-) |
20:30 |
|
kados |
importing them into Koha now |
20:31 |
|
kados |
http://library.afognak.org/ |
20:31 |
|
thd |
kados: ok more is better, maybe my script miscounted. |
20:31 |
|
kados |
hmmm, some have 0 count |
20:31 |
|
kados |
must be a flaw in my script |
20:32 |
|
kados |
and we've got major encoding probs |
20:32 |
|
kados |
http://library.afognak.org/cgi[…]-detail.pl?bib=12 |
20:32 |
|
kados |
for instance |
20:33 |
|
thd |
kados: that is very pretty, are they all like that? |
20:34 |
|
kados |
maybe we only have a few encoding probs |
20:34 |
|
kados |
some of the records didn't conver to to utf-8 |
20:34 |
|
kados |
12 EncodingMARC-8 |
20:34 |
|
kados |
13 EncodingMARC-8 |
20:34 |
|
kados |
128 EncodingMARC-8 |
20:34 |
|
kados |
175 EncodingMARC-8 |
20:34 |
|
kados |
219 EncodingMARC-8 |
20:34 |
|
kados |
299 EncodingMARC-8 |
20:34 |
|
kados |
302 EncodingMARC-8 |
20:34 |
|
kados |
326 EncodingMARC-8 |
20:34 |
|
kados |
330 EncodingMARC-8 |
20:34 |
|
kados |
331 EncodingMARC-8 |
20:34 |
|
kados |
332 EncodingMARC-8 |
20:34 |
|
kados |
333 EncodingMARC-8 |
20:34 |
|
kados |
334 EncodingMARC-8 |
20:34 |
|
kados |
393 EncodingMARC-8 |
20:34 |
|
kados |
407 EncodingMARC-8 |
20:34 |
|
kados |
15 to be exact |
20:34 |
|
kados |
I bet everything but those 15 is ok |
20:35 |
|
thd |
kados: what happened to record 11? |
20:36 |
|
thd |
which would not have been 11.html because there were gaps. |
20:40 |
|
thd |
kados: look at the bad record more importantly entirely different record in the MARC view from the detail view for http://library.afognak.org/cgi[…]-detail.pl?bib=12 |
20:42 |
|
thd |
kados: the MARC record looks like the bad title only match for "Am salmon" |
20:46 |
|
kados |
thd: explain to me how the records are saved in the file with the tabs |
20:46 |
|
thd |
or rather author title which are also suspect matches |
20:47 |
|
kados |
thd: I think the problem with this whole thing is that you saved binary marc files with all different encodings into a file without specifying or controling the encoding |
20:47 |
|
kados |
thd: so that file has mixed encoding |
20:49 |
|
kados |
nope, I'm wrong about even that |
20:49 |
|
thd |
if (!empty($save_best) && !empty($recQuality) && $bestRec == 1) { |
20:49 |
|
thd |
$fh = fopen($marcCaptureFile, 'a') or die("can't open file"); |
20:49 |
|
thd |
fwrite($fh, $rawRec); |
20:49 |
|
thd |
fclose($fh); |
20:49 |
|
thd |
$extraValues[] = ""; |
20:49 |
|
thd |
$extraValues = array($progress, $version, |
20:49 |
|
thd |
$recQuality, $recYearMatch, $hostSpec, |
20:49 |
|
thd |
$search_fields, $isbn, $author, $title, |
20:49 |
|
thd |
$publication, $pub_place, $pub_year, |
20:49 |
|
thd |
$l_subject, $l_subject_subdivision, |
20:49 |
|
thd |
$purchased_price, $quantity, $rawRec); |
20:49 |
|
thd |
$extraValuesRow = implode("\t", $extraValues) . "\n"; |
20:49 |
|
thd |
$fh = fopen($marcExtraValuesFile, 'a') or die("can't open file"); |
20:49 |
|
thd |
fwrite($fh, $extraValuesRow); |
20:49 |
|
thd |
fclose($fh); |
20:49 |
|
thd |
} |
20:49 |
|
kados |
thd: is this perl? |
20:50 |
|
thd |
kados: no Perl uses join for join not implode :) |
20:53 |
|
kados |
thd: so this is php? |
20:53 |
|
thd |
kados: I assume that you would have the same problems if you tried importing the MARC records from the MARC records only file. |
20:53 |
|
kados |
thd: I have even more probs |
20:53 |
|
thd |
yes PHP :( |
20:53 |
|
kados |
hmmm |
20:54 |
|
kados |
I don't know what to say |
20:54 |
|
kados |
hmmm |
20:54 |
|
thd |
kados: what are the more problems? |
20:55 |
|
kados |
how can we salvage this? |
20:55 |
|
thd |
kados: I can send you the scripts and we can rerun everything form your speedy system after we find the problem. |
20:55 |
|
kados |
hmmm |
20:56 |
|
thd |
\kados: what is the problem except for 15 records? |
20:56 |
|
kados |
I don't even know where to begin looking for the problem |
20:56 |
|
thd |
kados: but what problem do you actually see aside from 15 records? |
20:57 |
|
kados |
well, I can't seem to convert the marc-8 to utf-8 |
20:57 |
|
kados |
because of the encoding probs |
20:57 |
|
kados |
so if I leave everything as marc-8 |
20:57 |
|
kados |
I can import all but 3 or so |
20:58 |
|
kados |
and at least 15 records appear mangled |
20:58 |
|
thd |
kados: I was uncertain about how I added the newline separating the rows but do you see a problem there for . "\n" |
20:58 |
|
thd |
? |
20:58 |
|
kados |
yes that is also a problem |
20:58 |
|
kados |
but I was able to correct that |
20:58 |
|
kados |
in a true marc file |
20:59 |
|
kados |
you wouldn't have \n as the last char |
20:59 |
|
kados |
I was able to chomp() that line to remove it |
21:00 |
|
thd |
kados: yes of course but was there any strange character just before the newline? |
21:01 |
|
kados |
thd: well, there is the end of file that every marc has |
21:02 |
|
thd |
kados: as long as that is a well formed end of file then that is fine. |
21:04 |
|
thd |
kados: what happens if you try to import the records from the MARC only file? |
21:04 |
|
thd |
kados: What workaround did you use for the 11th record? |
21:04 |
|
kados |
thd: it dies on record 171 if I try to import from marc only |
21:05 |
|
thd |
kados: so in Perl you made some character conversion? |
21:12 |
|
kados |
thd: first I tried doing nothing with encoding |
21:12 |
|
kados |
then, I tried converting everything to utf-89 |
21:12 |
|
kados |
utf-8 even |
21:12 |
|
kados |
basically the real problem |
21:13 |
|
kados |
is that the records weren't saved correctly |
21:13 |
|
kados |
they must have been re-encoded by php or something |
21:14 |
|
kados |
were they downloaded directly in binary marc or were they scraped off an html page? |
21:14 |
|
thd |
kados: there were direct raw MARC so that we would not have this problem? |
21:15 |
|
kados |
hmmm |
21:15 |
|
kados |
very strange |
21:15 |
|
thd |
s/there/they/ |
21:15 |
|
kados |
so how did you acomplish the download? |
21:15 |
|
kados |
sometimes if you don't specify a binary transfer it doesn't do a binary transfer |
21:15 |
|
kados |
is that a possible cause of the problem? |
21:18 |
|
thd |
kados: $rawRec = yaz_record($id[$i],$p,"raw"); |
21:18 |
|
thd |
kados: raw is raw |
21:18 |
|
kados |
huh |
21:19 |
|
kados |
then presumably you write that $rawRec to a filehandle? |
21:19 |
|
kados |
huh ... what version of yaz are you running on that box? |
21:20 |
|
thd |
yes, written directly to filehandle |
21:24 |
|
thd |
kados: I have YAZ version 2.1.8-4 |
21:28 |
|
thd |
kados: I think I built PHP/YAZ for PHP5 because the Debian package is only for PHP4, and rather old. |
21:30 |
|
thd |
kados: Do you want to try running against the sample 29 records on your system? |
21:33 |
|
thd |
s/running/running the LWP and PHP script/ |
21:36 |
|
thd |
kados: are all records affected or only 15? |
21:58 |
|
thd |
kados: the content is ASCII the conversion presumably does nothing but change the indicated encoding in the leader |
21:58 |
|
thd |
s/the/if the/ |
21:59 |
|
kados |
ok, so whether or not I convert to utf-8 it crashes on number 171 |
21:59 |
|
kados |
I was mistaken that they all imported if they were not re-encoded |
21:59 |
|
thd |
kados: that is the 171st record imported? |
21:59 |
|
kados |
yep |
21:59 |
|
kados |
well, give or take one :-) |
21:59 |
|
kados |
my counting is notoriously off by one :-) |
22:00 |
|
thd |
:) |
22:00 |
|
kados |
thd: I'm going to try eliminating all problematic records from the write |
22:01 |
|
thd |
kados:that sound like an excellent plan |
22:03 |
|
thd |
at least as long as the remaining records are at least one |
22:15 |
|
kados |
I'm going to just have to manually skip the problem records |
22:34 |
|
thd |
kados: I do a character conversion for display from a separate variable. Maybe a PHP bug creates an upstream problem. $rec is used for display only. $rec = yaz_record($id[$i],$p,"render; charset=marc8,utf-8"); |
22:34 |
|
kados |
hmmm |
22:34 |
|
kados |
it might set the charset in yaz-record |
22:34 |
|
kados |
or something |
22:36 |
|
thd |
kados: maybe I should be setting the value of $rawRec first in $rawRec = yaz_record($id[$i],$p,"raw"); |
22:36 |
|
thd |
kados: order that the variables were set had not seemed important |
22:37 |
|
thd |
and it should not be important :) |
22:37 |
|
kados |
agreed |
22:37 |
|
kados |
but it might be |
22:40 |
|
kados |
wow, this is strange |
22:40 |
|
thd |
kados: what is strange? |
22:40 |
|
kados |
in this case, it dies on record 17 even if I delete records 10-25 |
22:41 |
|
thd |
kados: delete records 10-25 again |
22:42 |
|
kados |
i did |
22:42 |
|
thd |
and? |
22:42 |
|
kados |
same error |
22:42 |
|
thd |
did it die on 17 again? |
22:42 |
|
kados |
yep |
22:42 |
|
thd |
kados: that is a loop error unless 17 is an unlucky number |
22:44 |
|
thd |
kados: koha has many loop errors in the templates at least. |
23:39 |
|
thd |
kados: after moving the setting of $rawRec for saving before $rec for display, I still have the XML/parser error on the 11th record. |
23:50 |
|
kados |
thd: got 398 records in |
23:50 |
|
kados |
thd: http://library.afognak.org |
23:51 |
|
kados |
thd: a search on 'and' pulls up 298 of them |
23:51 |
|
kados |
thd: seems like quite a few duplicates in there |
23:55 |
|
kados |
and I don't see any with more than one copy |
23:58 |
|
Genji |
kados: you awake? |
23:58 |
|
kados |
Genji: barely :-) |
23:59 |
|
Genji |
kados: good enuf. Im looking at doing some koha devel... have any particular bug / feature enhancement you think I could tackle on my first day back? |
23:59 |
|
kados |
great news! |
00:00 |
|
kados |
know much about encoding? |
00:00 |
|
Genji |
hmm.... not really. what sort of encoding? |
00:00 |
|
kados |
we're currently really hurting in the encoding area |
00:00 |
|
kados |
character encoding |
00:00 |
|
Genji |
utf8? |
00:00 |
|
kados |
iso-8859-1 vs utf-8 vs marc-8, etc. |
00:00 |
|
kados |
we need Koha to be able to handle any encoding we hand it |
00:01 |
|
kados |
well ... |
00:01 |
|
kados |
there's other stuff |
00:01 |
|
kados |
that's just first on my mind |
00:01 |
|
kados |
have you tried out the new zebra plugin? |
00:01 |
|
kados |
there's plenty of work to do on zebra |
00:01 |
|
kados |
in head |
00:02 |
|
kados |
hmmm |
00:02 |
|
kados |
not quite |
00:02 |
|
kados |
we have that already |
00:02 |
|
kados |
in the form of MARC::File::XML |
00:03 |
|
kados |
I've been struggling with a bug |
00:03 |
|
kados |
related to encoding |
00:03 |
|
kados |
in rel_2_2 |
00:03 |
|
kados |
if you use a Koha that has iso-8859 encoding in the db |
00:03 |
|
kados |
and you upgrade to rel_2_2 |
00:04 |
|
kados |
when you edit existing records |
00:04 |
|
kados |
the special characters get mangled |
00:04 |
|
kados |
I think it's perl's fault |
00:04 |
|
kados |
but I haven't been successful in tracking down exactly where it's happening |
00:06 |
|
Genji |
special characters like accented e's etc? |
00:06 |
|
kados |
yep |
00:07 |
|
Genji |
so... perl is converting the characters before the script has the chance to convert them? |
00:08 |
|
kados |
well, it's complicated |
00:08 |
|
kados |
if you use CVS rel_2_2 |
00:08 |
|
kados |
and you updatedatabase |
00:08 |
|
kados |
there's a new syspref |
00:08 |
|
kados |
TemplateEncoding |
00:08 |
|
kados |
set that to the desired encoding |
00:08 |
|
kados |
so if you're running an old Koha |
00:08 |
|
kados |
it should probably be |
00:09 |
|
kados |
iso-8859 |
00:09 |
|
kados |
(if it's unimarc that is) |
00:09 |
|
kados |
(in marc21 there are only two valid encodings: marc8 and utf8) |
00:09 |
|
kados |
so ... once you've done that |
00:09 |
|
kados |
look in addbiblio.pl |
00:10 |
|
kados |
ahh ... before that |
00:10 |
|
kados |
you need to upgrade MARC::Record, MARC::File::XML and MARC::Charset |
00:10 |
|
kados |
to the latest sourceforge versions |
00:10 |
|
kados |
the CPAN versions won't cut it |
00:10 |
|
kados |
hehe |
00:10 |
|
kados |
k |
00:10 |
|
kados |
cool |
00:13 |
|
si |
happy easter, joshua |
00:13 |
|
kados |
si: can you oper me? |
00:13 |
|
kados |
si: you too :-) |
00:13 |
|
kados |
go /set +o kados |
00:14 |
|
si |
or indeed |
00:14 |
|
si |
/mode #koha +o kados |
00:14 |
|
kados |
woot |
00:14 |
|
kados |
thx |
00:14 |
|
si |
no worries |
00:15 |
|
kados |
hehe |
00:19 |
|
thd |
kados: the source file had many duplicates |
00:21 |
|
kados |
thd: right ... |
00:21 |
|
kados |
thd: multiple items are now working |
00:21 |
|
kados |
http://library.afognak.org/cgi[…]-detail.pl?bib=26 |
00:21 |
|
thd |
kados: sometimes the same record appears in the source both with and without an ISBN |
00:26 |
|
si |
russ, you might find this useful |
00:26 |
|
si |
http://video.google.com/videop[…]imoncelli&pl=true |
00:27 |
|
si |
ack, wrong # |
00:27 |
|
thd |
kados: from that record it seems that you created 690 subfields even when the values for the 690 subfields were empty |
00:32 |
|
kados |
thd: fixing that now |
00:33 |
|
kados |
thd: fixed |
00:39 |
|
thd |
kados: there is a similar issue for cost, purchase price. |
00:39 |
|
kados |
yep, investigating now |
00:56 |
|
thd |
kados: that record is actually a bad match. OCLC does not have what the record ought to be. That goes with the unfindable shareholder guides to a native corporation. |
01:02 |
|
thd |
kados: I suppose that any characters in MARC 8 would necessarily exist in Unicode. I assume MARC::Charset includes every native american language in MARC 8. |
01:48 |
|
Genji |
ok.. back again... on and off doing dishes.... |
02:44 |
|
Genji |
hiya all |
02:47 |
|
pierrick |
hi Genji |
02:53 |
|
Genji |
pierrick: whats your pet bug/idea for koha? |
02:57 |
|
pierrick |
Genji, sorry, I'm not native englush speaker :-/ what do you mean "pet bug/idea"? |
02:57 |
|
pierrick |
s/englush/english |
03:00 |
|
paul |
hello all |
03:00 |
|
pierrick |
hello Paul |
03:04 |
|
ToinS |
hello |
04:03 |
|
thd |
pierrick: It seems that Genji missed answering your English question. Pet something is favourite something, usually a personal favourite. |
04:07 |
|
thd |
pierrick: I think Genji was trying to ask what special idea were you interested in pursuing or implementing in Koha. |
04:08 |
|
thd |
pierrick: a bug in this context is something that keeps motivating you to pursue something. |
04:09 |
|
thd |
pierrick: are you there? |
04:14 |
|
pierrick |
thd, I'm back |
04:14 |
|
pierrick |
thd, thank you for precision |
04:15 |
|
thd |
pierrick: So what are you specially interested in pursing in or for Koha? |
04:15 |
|
paul |
thd : a quick english question |
04:15 |
|
thd |
yes paul |
04:15 |
|
pierrick |
Genji, I have no "pet idea/bug" for the moment. Maybe I'm interested in tagging biblio from users and presenting something like a tag cloud and related tags |
04:15 |
|
paul |
is "collectivity" a correct word so speak of an professional organisation |
04:16 |
|
paul |
(an institution, a company...) |
04:16 |
|
paul |
because i'm afraid it's a frenchism |
04:16 |
|
paul |
"collectivité" |
04:16 |
|
thd |
paul: Yes that is perfect French:) |
04:17 |
|
thd |
paul: try organisation |
04:18 |
|
thd |
paul: that would be the most interchangeable general term for institution, company, etc. |
04:18 |
|
paul |
pierrick: 1st bug squashing meeting => 18th, april, but which time ? |
04:20 |
|
paul |
(i should be here, with my tank, 2 fighters, 1 cruiser and at least 5 companies. I'll also get my +5 sword of the paladin and my shield of the deadbug) |
04:21 |
|
pierrick |
paul, I've written a specific mail for BSP |
04:22 |
|
paul |
ok, I missed it |
04:22 |
|
thd |
paul: you need some special potions to keep the bugs from returning from the dead |
04:23 |
|
thd |
pierrick: what is tagging biblios from users? |
04:24 |
|
paul |
tagging biblios is a very interesting idea, if I understand what it is ;-) |
04:25 |
|
thd |
paul: if you understand what is it? |
04:25 |
|
paul |
yes, I think I know what pierrick is speaking of, and i'm waiting for it's explanations. |
04:25 |
|
paul |
but if i'm right, it's an interesting idea ;-) |
04:25 |
|
Genji |
whats tagging biblios? |
04:26 |
|
thd |
paul: I hope it is even an interesting idea if you are wrong :) |
04:26 |
|
pierrick |
http://lists.nongnu.org/archiv[…]-10/msg00011.html |
04:27 |
|
pierrick |
I wanted to see if you already had the idea before me, and yes. But it does not seem to have been implemented |
04:28 |
|
thd |
pierrick: How would you describe tagging biblios from users? I know hat the individual words mean but not the concept that you are attempting to describe. |
04:28 |
|
pierrick |
"Tagging" is a very common features nowadays in applications managing items (any kind of items). |
04:29 |
|
Genji |
hmmm.. users tag.. as in add subjects to their own books? |
04:29 |
|
thd |
pierrick: I have had a few interesting ideas before you were born :) |
04:29 |
|
pierrick |
thd, an example: you're connected to the OPAC, you can add some tag to a biblio |
04:30 |
|
thd |
pierrick: you mean user added notes fields. |
04:30 |
|
pierrick |
thd, yes some kind of |
04:31 |
|
pierrick |
but maybe having only "tagging" and not "user tagging" would be enough |
04:31 |
|
pierrick |
The idea is the new navigation way it creates |
04:32 |
|
thd |
pierrick: well the cataloguer can already add notes if that is what you mean by tagging. |
04:32 |
|
pierrick |
thd, why not using it |
04:33 |
|
pierrick |
what I mean is having this kind of navigation : http://www.modusoptimus.com/bsf/tags.php |
04:33 |
|
thd |
pierrick: what do you mean by why not using it? |
04:33 |
|
pierrick |
I mean I don't really mind were tags come from |
04:33 |
|
pierrick |
from users, from librarians... |
04:34 |
|
thd |
pierrick: yes records need as many access points and as much content as can be provided. |
04:34 |
|
pierrick |
the origin is a question, but IMO the most important is the navigation |
04:35 |
|
pierrick |
if users can participate, it can be interesting but not mandatory |
04:35 |
|
thd |
pierrick: such tags seem like user added subject headings. |
04:35 |
|
pierrick |
thd, yes. The admin of the gallery has added the tags manually |
04:35 |
|
pierrick |
(through metadata in reality) |
04:37 |
|
pierrick |
having a navigation mode based on chronology would be interesting too |
04:38 |
|
pierrick |
I think the current OPAC is only a search form while we could provide other navigation modes |
04:38 |
|
thd |
pierrick: I had asked hdl some time ago why that was not already a feature extension to the virtual bookshelves. |
04:38 |
|
paul |
pierrick: the best would be to have a world-wide tag system, based on ISBN |
04:38 |
|
pierrick |
search mode, tag mod, category mode, chronology mode, best rated, most read, etc. |
04:39 |
|
paul |
when a library from Koha network get a tag, it is send to a central server, that can distribute it to all libraries during the night. |
04:39 |
|
thd |
pierrick: that is a very important concept. Blank search forms are a very limited concept. |
04:39 |
|
paul |
OK guys, I've just commited many many things for borrowers improvements. |
04:39 |
|
pierrick |
paul, that would mean the set of avilable tags is centralized |
04:40 |
|
paul |
it works correctly for me, I think i've commited everything. Could someone check that an update of CVS + updater/updatedatabase make the feature working ? |
04:40 |
|
paul |
you can check what it does at : |
04:41 |
|
paul |
http://i20.bureau.paulpoulain.[…]s/members-home.pl |
04:41 |
|
paul |
(login test/test) |
04:41 |
|
pierrick |
thd, my experience is 5 years of photo gallery development, and IMO managing biblio is not that different from managing photographs |
04:41 |
|
paul |
pierrick: why do you want an available tag list ? |
04:41 |
|
thd |
pierrick: It would not need to be centralised if every library was free to add individual fields and subfields from records at any other library automatically in a distributed network. |
04:41 |
|
pierrick |
thd, in PWG there are several mode of navigation, not only the search one. |
04:42 |
|
paul |
I thought such system where without any available tag list, and the user could enter what he wants. |
04:42 |
|
thd |
pierrick: what is PWG? |
04:42 |
|
pierrick |
thd, PhpWebGallery |
04:43 |
|
pierrick |
paul, don't you think it would become a real mess if all libraries share their tags? |
04:43 |
|
paul |
in fact, i'm not sure there would be so many ppl entering tags. so having them worldwide could improve a lot their interest |
04:43 |
|
paul |
oups, I must leave NOW |
04:44 |
|
thd |
paul: well one could have both a standard tag thesaurus and free form tagging. One need not exclude the other. |
04:45 |
|
thd |
pierrick: the user can have filters to protect himself from the mess potential. |
04:46 |
|
pierrick |
thd, of course, it could mandatory that a librarian validate tags before making them public |
04:47 |
|
thd |
pierrick: I favour standard thesauri but I also like the idea of giving users freedom to contribute to the library in any way that is comfortable to the user. |
04:48 |
|
pierrick |
I know an online service blogmarks.net that let each user have public and private tags on their bookmarks |
04:49 |
|
thd |
pierrick: yes there will always be the issue of the librarian needing to protect the library institution form users who may see public tagging as a forum for causing mischief :) |
04:49 |
|
thd |
s./form/from/ |
04:51 |
|
thd |
pierrick: I would like to see such a feature implemented in a MARC compliant manner even if it would inevitably lead to breaking the ISO 2709 record size barrier requiring XML. |
04:51 |
|
thd |
pierrick: How is your study of MARC going? |
04:52 |
|
pierrick |
after this discussion, I would answer Genji that I would like to have other navigation mode in the OPAC, not only the search mode |
04:53 |
|
pierrick |
thd, not very far. I think I understand the tag/subfields structure. I don't bind description to each tag/subfield depending on the MARC flavour |
04:55 |
|
thd |
pierrick: full marks!! see my brief paragraphs about the alternative to the search paridym at http://www.agogme.com |
04:56 |
|
thd |
s/paradym/paradym/ |
04:57 |
|
thd |
pierrick: you should look at a library science textbook to understand the concepts behind what goes into MARC well. |
05:10 |
|
thd |
pierrick: There is a generally well respected book by Chan, although, I unfortunately do not have a copy. |
05:12 |
|
thd |
pierrick: http://www.amazon.com/gp/product/0070105065 |
05:14 |
|
thd |
pierrick: I do not know what might be a French equivalent, but would be pleased to know |
05:16 |
|
pierrick |
thanks a lot thd, I don't need a translation, the majority of my technical books are in english |
05:24 |
|
thd |
pierrick: the Chan book is most likely excellent but it may be helpful to consult also a book on French UNIMARC practise. |
05:24 |
|
thd |
pierrick: http://www.amazon.fr/exec/obidos/ASIN/2765405514 |
05:29 |
|
slef |
;-) |
05:29 |
|
thd |
pierrick: and volume 2 http://www.amazon.fr/exec/obidos/ASIN/2765408246 |
05:31 |
|
thd |
what is this error: Can't locate object method "as_xml" via package "MARC::Record" ? |
05:34 |
|
pierrick |
slef, what does "puts the boot" means? |
05:36 |
|
thd |
pierrick: Savannah hardware is not fast enough for me to see exactly what slef means yet. |
05:37 |
|
thd |
pierrick: the term may generally mean to impolitely kick another in the trousers for obtaining attention severely. |
05:38 |
|
slef |
thd: it's obtaining possession of the football by kicking the player in the back of the leg |
05:39 |
|
pierrick |
slef, you answered to my mail about forum Vs mailing-list? |
05:39 |
|
slef |
yep |
05:40 |
|
pierrick |
didn't received yet |
05:40 |
|
thd |
slef: I understand what that would mean but that leaves me even more uncertain of your message. |
05:40 |
|
pierrick |
but I had hoped you wouldn't answer |
05:42 |
|
slef |
thd: je rigole |
05:42 |
|
thd |
pierrick: FSF is buying new servers for the mail system and improving the routing so the mail queue may not be an endless disc thrashing session in a few months. They know the delay of the mail queue is a very important issue. |
05:44 |
|
slef |
http://www.fsf.org/blogs/sysadmin/lists |
05:48 |
|
thd |
slef: well yes that was a point of significant discussion and a special presentation at the FSF members meeting. |
05:52 |
|
thd |
slef: at least the FSF mail system is not any worse than I have experienced on Sourceforge. All Koha devel messages are at least appearing in the log without loss eventually :) |
06:34 |
|
slef |
thd++ |
07:50 |
|
pierrick |
kados, are you around? |
08:14 |
|
kados |
pierrick: am now |
08:15 |
|
pierrick |
I'm testing the zebra plugin (still installing it) |
08:15 |
|
kados |
cool |
08:15 |
|
pierrick |
mis/missing090field.pl is not in rel_2_2 but in HEAD |
08:15 |
|
pierrick |
(and it should not be, I suppose) |
08:16 |
|
kados |
hmmm |
08:16 |
|
kados |
right |
08:16 |
|
kados |
could you commit it to rel_2_2? |
08:16 |
|
pierrick |
I you want |
08:16 |
|
pierrick |
if you want |
08:16 |
|
kados |
yep |
08:18 |
|
pierrick |
done |
08:18 |
|
kados |
thx |
08:25 |
|
kados |
paul_away: are you here? |
08:25 |
|
pierrick |
pierrickplegall:~/dev/koha/head/misc/zebra$ wc -l ./unimarc/zebra.cfg ./usmarc/zebra.cfg |
08:25 |
|
pierrick |
31 ./unimarc/zebra.cfg |
08:25 |
|
pierrick |
65 ./usmarc/zebra.cfg |
08:26 |
|
pierrick |
should I suppose unimarc zebra configuration file is not up to date at all? |
08:26 |
|
kados |
pierrick: yep :-) |
08:29 |
|
pierrick |
are there difference between the two files? |
08:29 |
|
kados |
well, yes |
08:29 |
|
pierrick |
(should there be differences?) |
08:29 |
|
kados |
I think so |
08:29 |
|
kados |
lemme look quickly |
08:29 |
|
pierrick |
:-/ |
08:30 |
|
kados |
pierrick: no they can be the same |
08:30 |
|
pierrick |
OK |
08:30 |
|
pierrick |
thx |
08:31 |
|
pierrick |
kados, you created a "kohaplugin" user on your system? |
08:34 |
|
thd |
kados: what causes this error from bulkmarcimport.pl on any MARC record: Can't locate object method "as_xml" via package "MARC::Record" ? |
08:37 |
|
thd |
kados: I last updated MARC::XML a few weeks ago but the current CPAN version has a make test error. |
08:38 |
|
thd |
kados: rel_2_2 is now so broken for me that I cannot import even one record :( |
08:38 |
|
kados |
pierrick: no, no kohaplugin user |
08:38 |
|
kados |
thd: you must use the sourceforge version of MARC::File::XML |
08:39 |
|
kados |
thd: the cpan version has a make test error? |
08:39 |
|
thd |
kados: yes I was using the CPAN version |
08:40 |
|
kados |
thd: did you install MARC::Charset? |
08:40 |
|
kados |
thd: check kohadocs.org for instructions on installing the latest sourceforge versions |
08:40 |
|
kados |
thd: my 'installing on debian' document has details |
08:40 |
|
thd |
kados: MARC::Charset and MARC::Record are up to date |
08:41 |
|
kados |
thd: MARC::Record needs to be installed from Sourceforge |
08:41 |
|
kados |
thd: as with MARC::Charset |
08:41 |
|
kados |
thd: the CPAN versions aren't up to date |
08:46 |
|
thd |
kados: I have succeeded slowly in capturing more MARC records by gradually adding targets and searching the correct form of serial titles. I am a little less than half way through the 615 records. |
08:46 |
|
thd |
s/615/no hits from 615/ |
08:48 |
|
kados |
thd: do those MARC records have the same encoding probs as the first batch? |
08:48 |
|
kados |
(out of curiosity) |
08:49 |
|
thd |
kados: I think the only encoding problems were some native Alaskan names. |
08:50 |
|
thd |
kados: I cannot import any record at the moment so I cannot determine encoding problems even for my old ASCII only records :0 |
08:51 |
|
thd |
kados: Why is CPAN behind? |
08:51 |
|
kados |
thd: CPAN: it's a long story |
08:51 |
|
kados |
thd: involving some people at follett |
08:51 |
|
thd |
kados: I thought that you were updating CPAN to avoid this problem |
08:52 |
|
kados |
thd: yes, I do have access to MARC::File::XML finally |
08:52 |
|
kados |
thd: but sourceforge version has some untested functions that need to be tested before being put in CPAN |
08:52 |
|
kados |
hey slef |
08:53 |
|
kados |
slef: you've dealt with encoding issues in perl, right? |
08:53 |
|
kados |
slef: utf8 "\xEB" does not map to Unicode at ./afognak2koha.pl line 25, <INFILE> line 1. |
08:53 |
|
kados |
slef: is there any way to get Encode to just warn rather than die on that error? |
08:54 |
|
thd |
kados: what about the possibility that the MARC::Charset mapping or Unicode is incomplete for native Alaskan languages? |
08:54 |
|
kados |
it's possible |
08:55 |
|
kados |
thd: but line 1 is : InDesign CS2 for dummies |
08:55 |
|
slef |
what's afognak2koha.pl line 25 ? |
08:55 |
|
kados |
slef: while (my $line = <INFILE>) { |
08:56 |
|
kados |
where INFILE is: |
08:56 |
|
kados |
open(INFILE, "<:utf8",$infile); |
08:56 |
|
thd |
kados: well if that is the record how could there be any non-ASCII characters? |
08:56 |
|
kados |
thd: I have no idea, there must be something going on somewhere in that yaz or php stuff |
08:58 |
|
slef |
kados: why are you trying to open a non-utf8 $infile with :utf8? |
08:58 |
|
kados |
slef: either way I get the error |
08:58 |
|
kados |
slef: the reason is because there are some wide chars in the file |
08:59 |
|
slef |
what encoding is $infile? |
08:59 |
|
kados |
slef: it's one of the problems with a batch file of marc records - they can lie about their encoding ... claiming to be marc-8 but actually some other encoding |
08:59 |
|
kados |
slef: it seems to be a combo of encodings :-) |
08:59 |
|
kados |
slef: but I'm assuming either mostly 8859 or marc-8 |
09:00 |
|
kados |
slef: with a few hundred wide chars thrown in |
09:00 |
|
slef |
8859-which? Can file figure it out? |
09:00 |
|
kados |
hmmm ... not sure how to do that |
09:01 |
|
slef |
file yourbatchfile # on the command line |
09:01 |
|
kados |
k |
09:01 |
|
kados |
file ../alaska_mrc.mrc # |
09:01 |
|
kados |
../alaska_mrc.mrc: data |
09:01 |
|
slef |
you can try <:raw instead, but you may end up outputting gibberish if you can't fix the encoding |
09:02 |
|
kados |
I'll try that |
09:02 |
|
slef |
Have you read man perlunicode? |
09:02 |
|
kados |
still dies on the first record |
09:02 |
|
kados |
yea, but I could probably do a re-read |
09:04 |
|
slef |
hrm, I guess you get to play "guess the encoding" if this is a one-off |
09:06 |
|
kados |
the thing that gets my goat is that it just dies |
09:06 |
|
kados |
I'd be fine with it just warning and mangling a single character in that record |
09:07 |
|
thd |
kados: maybe the issue for that record could be an em dash if "440 0 $a --For dummies", had an em dash. However it seems to have two hyphens. |
09:08 |
|
thd |
kados: That is the second captured record. The second captured record had no problem for me. |
09:09 |
|
slef |
kados: well, you're asserting that it's utf8 when it isn't. Could you just open() it and read it in and *then* test it? |
09:09 |
|
kados |
slef: whether I open as utf-8 or not it dies with the same error |
09:10 |
|
kados |
slef: I just tried 'raw' with the same results |
09:10 |
|
slef |
kados: sounds like something is ignoring you. Can you publish script and test data? |
09:10 |
|
kados |
yep |
09:11 |
|
slef |
I'll take a look on the ramfs here |
09:12 |
|
kados |
http://kados.org/afognak2koha.pl |
09:12 |
|
kados |
http://kados.org/alaska_mrc_extra_val.csv |
09:13 |
|
thd |
s/second captured/first captured/ |
09:13 |
|
kados |
slef: I typically run it like this: |
09:13 |
|
kados |
./afognak2koha.pl alaska_mrc_extra_val.csv all.mrc alldump.txt |
09:13 |
|
thd |
kados: you did not say last night that you had problems with the first captured record. |
09:13 |
|
kados |
where the 'all.mrc' is the output |
09:13 |
|
kados |
thd: sorry, it's not the first record |
09:14 |
|
thd |
kados: which record is it for you? |
09:14 |
|
kados |
thd: it's number 170 or something |
09:14 |
|
kados |
171 actually |
09:14 |
|
kados |
in this version of the script I've got the whole operation wrapped in 'eval' |
09:15 |
|
kados |
records 170 and 308 throw errors |
09:15 |
|
kados |
in eval |
09:15 |
|
kados |
the rest seem to go in ok |
09:15 |
|
kados |
but then when I try to bulkmarcimport |
09:15 |
|
kados |
rather than 479 records |
09:15 |
|
kados |
I only get 398 |
09:17 |
|
thd |
kados: are you certain about the title? |
09:17 |
|
kados |
thd: no, I was incorrect about that record |
09:17 |
|
kados |
thd: sorry about that |
09:17 |
|
pierrick |
kados, zebra plugin is working on my 2.2 :-) |
09:17 |
|
kados |
pierrick: great! |
09:17 |
|
slef |
kados: ok, ready for an annoying thing? |
09:17 |
|
kados |
slef: sure |
09:17 |
|
slef |
kados: the data reading part works OK here. |
09:18 |
|
kados |
slef: meaning you get all the way through the file? |
09:18 |
|
kados |
slef: that's probably because of the eval |
09:18 |
|
kados |
maybe not though |
09:18 |
|
kados |
hmmm |
09:18 |
|
slef |
no, I commented all the MARC stuff as the ramfs box hasn't got MARC::* installed |
09:18 |
|
kados |
ahh |
09:18 |
|
slef |
or at least not MARC/File/XML |
09:18 |
|
kados |
hmm |
09:18 |
|
kados |
I think i was wrong about that reading problem |
09:19 |
|
kados |
it seems to be working now |
09:19 |
|
kados |
grrr |
09:20 |
|
slef |
I think you need to test and encode before |
09:20 |
|
slef |
$record = MARC::File::USMARC->decode($marcrecord); # or warn "not this record\n"; |
09:20 |
|
kados |
slef: if you comment out the first eval |
09:20 |
|
slef |
as you say |
09:20 |
|
kados |
record 171 throws: |
09:20 |
|
slef |
MARC::Charset->assume_unicode(1); |
09:20 |
|
kados |
utf8 "\xEC" does not map to Unicode |
09:20 |
|
kados |
yea, but I've tried with and without that lie |
09:20 |
|
kados |
line |
09:20 |
|
kados |
same end result |
09:21 |
|
kados |
slef: so line 171 has a non-mapping character in it |
09:21 |
|
slef |
replace the eval with |
09:21 |
|
kados |
utf8 "\xEC" does not map to Unicode at /usr/local/lib/perl/5.8.4/Encode.pm line 167, <INFILE> line 171. |
09:21 |
|
slef |
if (decode_utf8($marcrecord)) |
09:22 |
|
slef |
may need to use Encode 'decode_utf8'; too - I forget. |
09:22 |
|
kados |
k |
09:22 |
|
slef |
if that still works, I'll write something |
09:24 |
|
kados |
utf8 "\xEC" does not map to Unicode at /usr/local/lib/perl/5.8.4/Encode.pm line 167, <INFILE> line 171. |
09:24 |
|
kados |
I did: |
09:24 |
|
kados |
if (Encode::decode_utf8($marcrecord)) { |
09:25 |
|
kados |
at one point I tried |
09:25 |
|
kados |
use Encode qw /WARN_ON_ERR/ |
09:25 |
|
kados |
slef: unfortunately, the cpan version of MARC::* don't support unicode |
09:25 |
|
kados |
slef: to work with unicode with MARC::* you need to grab the sourceforge versions |
09:26 |
|
kados |
slef: http://www.kohadocs.org/Instal[…]Debian_sarge.html |
09:26 |
|
kados |
slef: cvs listed there |
09:27 |
|
slef |
this installation sucks atm |
09:28 |
|
kados |
no kidding |
09:28 |
|
pierrick |
kados, could you add the instruction "run # zebrasrv localhost:2100/kohaplugin" in your documentation ? |
09:28 |
|
kados |
pierrick: it's not there? |
09:28 |
|
slef |
uh, isn't there a tarball of marcpm? |
09:29 |
|
kados |
pierrick: I'll tell stephen to add it |
09:29 |
|
kados |
slef: yea, on the sourceforge page you can nab it |
09:30 |
|
slef |
bah, I'm going to have to leave this and get back to work |
09:30 |
|
kados |
k ... thanks :-) |
09:32 |
|
slef |
what's the current state of koha support modules debian packages? |
09:33 |
|
kados |
completely unmaintained to my knowledge |
09:34 |
|
slef |
libyaz-dev seems to be there |
09:34 |
|
kados |
yea, but libyaz-dev what version? |
09:34 |
|
slef |
I'd upload libperl-marc-* if someone has time to build them, or you can wait for me to remember how perl debs work. |
09:35 |
|
slef |
libyaz-dev 2.1.8 |
09:35 |
|
slef |
what do we need? |
09:35 |
|
kados |
the very latest :-) |
09:35 |
|
pierrick |
2.1.16 |
09:35 |
|
slef |
can you add the versions needed to http://www.kohadocs.org/Instal[…]Debian_sarge.html please? |
09:35 |
|
kados |
yep |
09:36 |
|
kados |
well, it keeps changing |
09:36 |
|
kados |
on like a weekly basis |
09:36 |
|
kados |
esp with all the work being done on MARC::* |
09:36 |
|
pierrick |
kados, the latest version of zebra available on indexdata site is 1.3.34 |
09:36 |
|
slef |
can you add them to koha/Makefile.PL in CVS instead then, please? |
09:36 |
|
slef |
it really would help me get the installer working |
09:36 |
|
kados |
I don't have time to track everything down |
09:36 |
|
kados |
pierrick: could you do it? |
09:37 |
|
pierrick |
OK |
09:37 |
|
kados |
thx |
09:37 |
|
kados |
hi paul |
09:37 |
|
kados |
paul: still aiming for the 18th for release date? |
09:38 |
|
slef |
could I just set everything to today's versions? |
09:38 |
|
paul |
kados : no, of course. |
09:38 |
|
kados |
slef: yep |
09:38 |
|
slef |
as in, does it work today? ;-) |
09:38 |
|
kados |
paul: good :-0 I was beginning to worry :-) |
09:38 |
|
kados |
slef: well, not quite |
09:38 |
|
paul |
as there are still some major bugs. |
09:38 |
|
kados |
slef: we still have major encoding troubles |
09:38 |
|
kados |
slef: similar but different to the ones we discussed today |
09:38 |
|
slef |
how about 2.4? what libyaz does that need? |
09:39 |
|
kados |
slef: depends on if you use the zebra plugin or not |
09:39 |
|
pierrick |
standard 2.4 doesn't need zebra |
09:39 |
|
kados |
slef: if you use zebra, it needs the very latest |
09:39 |
|
slef |
standard 2.4 |
09:39 |
|
kados |
slef: otherwise, it doesn't need the very latest |
09:39 |
|
kados |
pierrick: but it does need yaz |
09:39 |
|
pierrick |
oups |
09:39 |
|
kados |
slef: but 2.4 does require very latest MARC::* |
09:40 |
|
kados |
slef: and a MARC::File::XML that hasn't been written yet :-) |
09:41 |
|
kados |
what's unstable? |
09:41 |
|
pierrick |
MARC::* |
09:41 |
|
kados |
how so? |
09:41 |
|
pierrick |
using the HEAD CVS is dangerous |
09:41 |
|
pierrick |
kados, it seems we have no choice if we want zebra working |
09:41 |
|
kados |
right, but MARC::* isn't like Koha, their HEAD almost always works |
09:42 |
|
pierrick |
so my sentence is useless, just to say I think "we are playing with fire" |
09:42 |
|
kados |
yea |
09:43 |
|
paul |
kados: a quick question about openCataloger |
09:43 |
|
pierrick |
so, do I add something to Makefile.PL? (on rel_2_2) |
09:44 |
|
kados |
paul: sure |
09:44 |
|
paul |
ToinS is writing a document to explain what he will work on, and how. |
09:44 |
|
paul |
he seems confident with XUL now ! |
09:44 |
|
kados |
great! |
09:44 |
|
paul |
do we create a openCat project on savannah ? |
09:45 |
|
paul |
maybe it could be a good idea to play with subvestion with openCat ? |
09:45 |
|
kados |
paul: sure |
09:45 |
|
paul |
x2 ? |
09:45 |
|
kados |
yep |
09:46 |
|
kados |
I'll warn you that savannah takes forever to register a project |
09:46 |
|
kados |
I tried to register openncip there |
09:46 |
|
kados |
eventually I gave up and went to sourceforge |
09:46 |
|
kados |
a month later they accepted openncip |
09:46 |
|
paul |
ok, then maybe another OSS platform ? |
09:46 |
|
kados |
but requested I change the name to freencip :-) |
09:46 |
|
kados |
the nerv! |
09:46 |
|
kados |
I'm ok with whatever |
09:47 |
|
kados |
you can decide |
09:48 |
|
kados |
paul: did you see my mention of a bug in syncing between koha tables and marc tables? |
09:48 |
|
pierrick |
it took less than a day to register PEM on gna.org |
09:48 |
|
thd |
kados: freedom is better than openness any day |
09:48 |
|
kados |
paul: on koha-devel |
09:48 |
|
paul |
kados: not yet |
09:49 |
|
kados |
paul: it seems that Koha never deletes old holdings entries from marc_word |
09:49 |
|
kados |
paul: I will file a bug report and mark it as blocker |
09:50 |
|
kados |
paul: email was: Apr 12 Joshua Ferraro ( 23) [Koha-devel] Bug in MARC sync in rel_2_2 |
09:51 |
|
kados |
bug report created |
10:13 |
|
pierrick |
kados, zebra has worked but I should not have tried to solve the encoding problems... nothing work anymore |
10:14 |
|
slef |
never mind hosting services, you have your own webspace, you have git/cogito, host it yourself |
10:38 |
|
kados |
pierrick: could you explain? |
10:41 |
|
pierrick |
kados, it seem I can't update my zebra database after recreation |
10:41 |
|
kados |
recreation? |
10:41 |
|
pierrick |
drop/create |
10:42 |
|
pierrick |
I had encoding problem in what was displayed |
10:42 |
|
kados |
drop doesn't work in zebra |
10:42 |
|
kados |
you have to go: |
10:42 |
|
kados |
zebraidx init |
10:42 |
|
kados |
which deletes everything |
10:43 |
|
pierrick |
OK, in fact my problem seems to be my export |
10:44 |
|
kados |
ahh |
10:44 |
|
pierrick |
I've converted my marc_subfield_table to utf8 and now my export is very small |
10:45 |
|
pierrick |
(it was 6MB before and now it's 0.2MB) |
10:45 |
|
kados |
that can't be good |
10:47 |
|
pierrick |
should I use export.pl in 2.2 or HEAD ? |
10:52 |
|
pierrick |
wait... I made a mistake in the utf8 conversion procedure |
10:54 |
|
kados |
pierrick: export.pl in 2.2 I think |
10:54 |
|
pierrick |
no, I reversed and my export is still 0.2MB :-/ I really don't understand all this thing :-/ |
10:55 |
|
kados |
weird |
10:55 |
|
kados |
pierrick: did you do 'select count(*) from biblio' |
10:55 |
|
kados |
before and after? |
10:55 |
|
pierrick |
what is a normal weight for a isoXXX export ? |
10:55 |
|
pierrick |
I didn't |
10:56 |
|
pierrick |
did the export deleted my biblio ???? |
10:56 |
|
kados |
no, it shouldn't |
10:56 |
|
kados |
it just exports it :-) |
10:56 |
|
kados |
I mean do the select count(*) before converting to utf8 |
10:56 |
|
kados |
but that doesn't make sense |
10:56 |
|
kados |
sorry :-) |
10:56 |
|
pierrick |
no that doesn't |
10:57 |
|
pierrick |
if it did, we would have a big problem for Koha 3.0 conversion to utf8 |
10:58 |
|
pierrick |
I have 10K biblio |
10:59 |
|
pierrick |
kados, how do I "select count(*) from zebra" ? |
10:59 |
|
kados |
not sure |
10:59 |
|
kados |
maybe ask koha-zebra? |
11:00 |
|
kados |
I'm interested too :-) |
11:00 |
|
paul |
iirc, I asked indexdata ml, and got an answer. |
11:00 |
|
paul |
it was something strange & long |
11:02 |
|
paul |
jan,9th |
11:02 |
|
paul |
Paul POULAIN wrote: |
11:02 |
|
paul |
> Hi, |
11:02 |
|
paul |
> |
11:02 |
|
paul |
> Is there a tool to have some infos on a zebra DB. |
11:02 |
|
paul |
> something like the number of record in the DB, and other related infos ? |
11:02 |
|
paul |
You can get some info like that out of Zebra by searching the Explain database... Using the YAZ client: |
11:02 |
|
paul |
% yaz-client host:port/IR-Explain-1 |
11:02 |
|
paul |
Z> find @attr exp1 1=1 databaseInfo |
11:02 |
|
paul |
Z> form xml |
11:02 |
|
paul |
Z> show 1 |
11:02 |
|
paul |
etc. |
11:02 |
|
paul |
The XML representation of Explain records is private to Zebra, but much easier to handle than the more standard alternatives. |
11:02 |
|
paul |
You can retrieve information about a specific database name like this: |
11:02 |
|
paul |
Z> f @and @attr exp1 1=1 databaseInfo @attr exp1 1=3 myDatabaseName |
11:02 |
|
paul |
The contents should be self-explanatory. |
11:02 |
|
paul |
You can also ask for targetInfo, AttributeSetInfo (per database, as before), and possibly other things.. |
11:02 |
|
paul |
(my question & sebastian hammer answer) |
11:02 |
|
kados |
cool |
11:03 |
|
paul |
a phpYazAdmin would be something useful ;-) |
11:03 |
|
kados |
I get: |
11:03 |
|
kados |
Sent presentRequest (1+1). |
11:03 |
|
kados |
Records: 1 |
11:03 |
|
kados |
[IR-Explain-1]Record type: XML |
11:03 |
|
kados |
<explain><databaseInfo>DatabaseInfo |
11:04 |
|
kados |
<commonInfo><dateAdded>20060401200734</dateAdded><dateChanged>20060402182248</dateChanged><languageCode>EN</languageCode></commonInfo><accessInfo><unitSystems><string>ISO</string></unitSystems><attributeSetIds><oid>1.2.840.10003.3.5</oid><oid>1.2.840.10003.3.1</oid></attributeSetIds></accessInfo><name>kohaplugin</name><userFee>0</userFee><available>1</available><recordCount><recordCountActual>148636</recordCountActual></recordCount><zebraInfo><recordBytes>133463477</ |
11:04 |
|
kados |
which looks at least in the 'ballpark' :-) |
11:04 |
|
kados |
but it would be very interesting to do |
11:04 |
|
kados |
select count(*) from biblio; |
11:04 |
|
kados |
then install zebra plugin |
11:04 |
|
kados |
then do the above |
11:04 |
|
kados |
and compaare |
11:05 |
|
pierrick |
I'm leaving office now, I'll continue my headache on zebra on tuesday :-) |
11:05 |
|
kados |
pierrick: have a great weekend :-) |
11:06 |
|
kados |
paul: about encoding probs |
11:06 |
|
kados |
paul: (before you leave) |
11:06 |
|
kados |
paul: I'll give you an update |
11:06 |
|
kados |
what I think is happening |
11:06 |
|
kados |
is that Perl is handing the XML parser |
11:07 |
|
kados |
/Perl/ thinks is utf-8. In other words, perl is mangling the data going into the parser, possibly turning it into valid, correct UTF8, but the parser has been told that this is /not/ in fact UTF8, but ISO |
11:07 |
|
kados |
so I need need to tell Perl that the data in the $xml variable /is not utf8/ |
11:07 |
|
kados |
but I"m not sure how to do this yet |
11:09 |
|
paul |
perldoc Encode don't give you a hint here ? |
11:09 |
|
kados |
I'll check |
11:10 |
|
kados |
but we want to avoid Encode completely I think |
11:27 |
|
paul |
kados: do we call our project OpenCat, OpenCataloger or OpenCataloguer ? |
11:28 |
|
kados |
paul: is OpenCat taken? |
11:28 |
|
paul |
I don't think so |
11:28 |
|
kados |
paul: ok, lets use it then |
11:28 |
|
kados |
paul: wait |
11:29 |
|
kados |
paul: opencat.org is taken |
11:29 |
|
paul |
opencat seems to be something for google |
11:29 |
|
kados |
paul: whereas I own opencataloger.org/com |
11:29 |
|
paul |
ok, so let's start with opencataloger then |
11:29 |
|
kados |
paul: sounds good |
11:31 |
|
paul |
ok, opencataloger registered at gna.org |
11:31 |
|
paul |
waiting for confirmation |
11:31 |
|
kados |
great! |
11:34 |
|
kados |
paul: so re: encoding |
11:35 |
|
kados |
paul: do we agree that Koha will support handling MARC-8, iso-8859, utf-8 |
11:35 |
|
kados |
it will only allow storage of utf-8 and iso-8859 |
11:35 |
|
kados |
and will not permit mixed encoding |
11:35 |
|
kados |
ie, if 8859-1, only 8859-1 |
11:36 |
|
kados |
if utf? |
11:36 |
|
kados |
s /if utfU// |
11:37 |
|
paul |
yes. |
11:37 |
|
paul |
except it's iso8859-15 and not -1 |
11:37 |
|
paul |
but the only diff is the ¤ |
11:38 |
|
kados |
ahh, well that would explain something |
11:38 |
|
kados |
maybe it already works perfectly |
11:38 |
|
kados |
I thought it was 8859-1 |
11:38 |
|
paul |
8859-1 has been transformed to -15 when EU switches to ¤ |
11:38 |
|
kados |
could you try latest rel_2_2 marc edits of existing data in emn fo r instance? |
11:39 |
|
paul |
but the only diff between them is the ¤ symbol |
11:39 |
|
kados |
after changing to 8859-15 in TemplateEncoding var? |
11:39 |
|
paul |
yes, but not for instance, as I must leave now (7PM soon) |
11:43 |
|
paul |
bye bye world |
11:44 |
|
kados |
mye paul |
11:44 |
|
kados |
bye even :-) |