Time |
S |
Nick |
Message |
11:00 |
|
pierrick_ |
I don't know exactly what was hdl problem, but I'm working on utf-8 handling for 3.0. I already encounter a problem with 2.2 : when I search '%ère%' (like in 'ministère'), it also returns things like 'Simonne CAILLERE' |
11:01 |
|
paul |
that's not a bug, that's a feature ! |
11:01 |
|
pierrick_ |
this is quit smart from MySQL, but this is not what I'm waiting for |
11:01 |
|
kados |
hehe |
11:01 |
|
paul |
thus, you need a mySQL collection other than utf8_unicode_ci |
11:01 |
|
hdl |
but this is also what librarian wait for. |
11:01 |
|
paul |
something like unicode_bin should be your friend ! |
11:01 |
|
kados |
pierrick_: with zebra, you have quite a bit of control over such behavior |
11:02 |
|
paul |
(although I agree with hdl I don't understand why you consider this as wrong) |
11:02 |
|
kados |
pierrick_: http://indexdata.dk/zebra/doc/data-model.tkl |
11:02 |
|
paul |
to all : I think we should clearly separate UTF8 problem with mySQL & UTF8 with zebra. I think it's a mySQL one isn't it ? |
11:02 |
|
pierrick_ |
paul: I've read more than twice MySQL documentation about mysql connection, maybe I missed something, gonna test one more time |
11:02 |
|
hdl |
some of them are quite old and always type data in capital letter. |
11:03 |
|
paul |
pierrick : maybe not, as unicode is a hard feature, for mySQL, as well as for Perl ! |
11:03 |
|
kados |
paul: on my rel_2_2 box I have no utf-8 problems |
11:03 |
|
hdl |
My pb is to clearly decode what is mysql pb and zebra. |
11:03 |
|
pierrick_ |
hdl: yes, google also transforms "é" to "e" |
11:04 |
|
hdl |
With accentuated letters, I encounter problems. |
11:04 |
|
hdl |
Would it be with biblio or with borrowers. |
11:05 |
|
pierrick_ |
paul: "select author from biblio where author like '%ère%'" should not return "Simone CAILLERE' |
11:05 |
|
pierrick_ |
if Koha makes a transformation, that's OK, but MySQL should see a difference |
11:06 |
|
hdl |
Is that not depending on collation or such ? |
11:06 |
|
pierrick_ |
hdl: I'm using default collation with latin1 character set, this is latin1_swedish_ci |
11:07 |
|
pierrick_ |
maybe I should test another collation, you're wright |
11:07 |
|
paul |
_ci means "Case Insensitive". |
11:07 |
|
paul |
change it to "Case sensitive" and you should have a different result |
11:08 |
|
hdl |
When playing with phpmyadmin I have NO display problems though. |
11:08 |
|
hdl |
This is quite embarassing for me. |
11:08 |
|
paul |
that's why I was thinking it was a perl problem. |
11:09 |
|
paul |
and thought the .utf8 in env was a good solution. |
11:09 |
|
hdl |
either PERL or MySQL or perl with Mysql. |
11:09 |
|
pierrick_ |
with latin1_general_cs, through phpmyadmin, it still returns Simone |
11:10 |
|
hdl |
Mysql may return latin1 whatever collation you set it to.... I read sthg about that yesterday. |
11:10 |
|
pierrick_ |
before installing phpmyadmin, I was using a perl script, with a "set names 'latin1'" before any query |
11:11 |
|
hdl |
http://lists.mysql.com/perl/3779 |
11:11 |
|
pierrick_ |
hdl: warning, there is character set for server, database, table, column and for the connection |
11:11 |
|
hdl |
Read this post. |
11:12 |
|
hdl |
About DBD::Mysql and utf8 |
11:13 |
|
pierrick_ |
OK |
11:14 |
|
hdl |
or try to googleize : DBD::mysql utf8 support |
11:14 |
|
kados |
hdl: do you have some MARC records with accented chars you can send me (the data you are having trouble with)? |
11:14 |
|
kados |
I will attempt to get it working on my test box |
11:14 |
|
paul |
just copy paste this : |
11:15 |
|
paul |
éàùÏ |
11:15 |
|
paul |
mmm... no. |
11:15 |
|
hdl |
xml or iso2709 ? |
11:15 |
|
paul |
they are not "true" utf8. |
11:15 |
|
paul |
NONE : |
11:15 |
|
kados |
hmmm |
11:15 |
|
kados |
I would need iso2709 |
11:15 |
|
hdl |
this channel is iso8859-1. |
11:15 |
|
paul |
the problem we are speaking of is a MYSQL one for instance. |
11:16 |
|
paul |
the zebra one is another thing. |
11:16 |
|
kados |
right ... so it's branch names, borrowers, etc. |
11:16 |
|
paul |
yep. |
11:16 |
|
hdl |
But it is linked. |
11:16 |
|
kados |
in that case, I can just copy/paste from websites |
11:16 |
|
paul |
yep |
11:16 |
|
kados |
hdl: I'm ok |
11:16 |
|
kados |
hdl: no need to send data |
11:16 |
|
kados |
first I must repair my HEAD box :-) |
11:17 |
|
hdl |
Since, when launching a rebuild_zebra.pl, you use MySQL data. |
11:17 |
|
hdl |
So if there is a perl/MySQL problem, when importing to zebra, you will import problems ;) |
11:19 |
|
kados |
hdl: http://kohatest.liblime.com/cg[…]admin/branches.pl |
11:20 |
|
kados |
hdl: this is the problem you're having? |
11:20 |
|
hdl |
no trspassing :/ |
11:21 |
|
hdl |
login/pass ? |
11:21 |
|
kados |
kohaadmin |
11:21 |
|
kados |
Bo0k52R3aD |
11:22 |
|
hdl |
Yes. |
11:22 |
|
kados |
hdl: and your locale is set to utf-8? |
11:22 |
|
hdl |
yes. |
11:23 |
|
hdl |
My locale, my addDefaultCharset in Apache, my keybord. |
11:23 |
|
kados |
hdl: what do you see when you type: locale ? |
11:23 |
|
kados |
I see mainly "en_US" |
11:23 |
|
kados |
on the kohatest machine |
11:24 |
|
hdl |
kados ::http://pastebin.com/592599 |
11:25 |
|
paul |
mmm... strange. my /etc/sysconfig/i18n contains : |
11:25 |
|
paul |
SYSFONTACM=iso15 |
11:25 |
|
paul |
LANGUAGE=fr_FR.UTF8:fr |
11:25 |
|
paul |
LC_ADDRESS=fr_FR.UTF8 |
11:25 |
|
paul |
LC_COLLATE=fr_FR.UTF8 |
11:25 |
|
paul |
LC_NAME=fr_FR.UTF8 |
11:25 |
|
paul |
LC_NUMERIC=fr_FR.UTF8 |
11:25 |
|
paul |
LC_MEASUREMENT=fr_FR.UTF8 |
11:25 |
|
paul |
LC_TIME=fr_FR.UTF8 |
11:25 |
|
paul |
LANG=fr_FR.UTF8 |
11:25 |
|
paul |
LC_IDENTIFICATION=fr_FR.UTF8 |
11:25 |
|
paul |
LC_MESSAGES=fr_FR.UTF8 |
11:25 |
|
paul |
LC_CTYPE=fr_FR.UTF8 |
11:25 |
|
paul |
LC_TELEPHONE=fr_FR.UTF8 |
11:25 |
|
paul |
LC_MONETARY=fr_FR.UTF8 |
11:25 |
|
paul |
LC_PAPER=fr_FR.UTF8 |
11:25 |
|
paul |
SYSFONT=lat0-16 |
11:25 |
|
paul |
BUT : |
11:25 |
|
paul |
a set gives me ! |
11:26 |
|
paul |
LANG=fr_FR |
11:26 |
|
paul |
LANGUAGE=fr_FR:fr |
11:26 |
|
paul |
LC_ADDRESS=fr_FR |
11:26 |
|
paul |
LC_COLLATE=fr_FR |
11:26 |
|
paul |
LC_CTYPE=fr_FR |
11:26 |
|
paul |
LC_IDENTIFICATION=fr_FR |
11:26 |
|
paul |
LC_MEASUREMENT=fr_FR |
11:26 |
|
paul |
LC_MESSAGES=fr_FR |
11:26 |
|
paul |
... |
11:26 |
|
paul |
what am I doing wrong ? |
11:26 |
|
paul |
s/set/locale/ |
11:28 |
|
hdl |
Did you restart your computer after your sysconfig modification ? |
11:28 |
|
paul |
yep |
11:29 |
|
hdl |
paul : i18n should be fr_FR.UTF-8 You certainly missed the hyphen (-) |
11:30 |
|
hdl |
you are used to mysql and perl ;) |
11:36 |
|
paul |
oups. no |
11:37 |
|
paul |
when I log as "paul" i'm not. |
11:37 |
|
paul |
when I su - i am. |
11:37 |
|
paul |
really strange... |
11:37 |
|
paul |
any idea someone ? |
11:37 |
|
kados |
paul: check your .bash* files |
11:38 |
|
kados |
could be that your charset is specified there? |
11:39 |
|
kados |
paul: http://dev.mysql.com/doc/refma[…]n.html?ff=nopfpls |
11:39 |
|
paul |
how could I see ? |
11:39 |
|
kados |
is that what you've done as of now? |
11:39 |
|
kados |
paul: in your /home/paul dir |
11:39 |
|
kados |
there are several .bash* files |
11:39 |
|
paul |
yes I know, but in .bashrc I don't see anything |
11:40 |
|
kados |
in .bash_profile? |
11:40 |
|
paul |
# Get the aliases and functions |
11:40 |
|
paul |
if [ -f ~/.bashrc ]; then |
11:40 |
|
paul |
. ~/.bashrc |
11:40 |
|
paul |
fi |
11:40 |
|
paul |
PATH=$PATH:$HOME/bin |
11:40 |
|
paul |
BASH_ENV=$HOME/.bashrc |
11:40 |
|
paul |
USERNAME="" |
11:40 |
|
paul |
export USERNAME BASH_ENV PATH |
11:40 |
|
paul |
xmodmap -e 'keycode 0x5B = comma' |
11:40 |
|
paul |
and that's all |
11:40 |
|
paul |
bashrc : |
11:40 |
|
paul |
alias rm='rm -i' |
11:40 |
|
paul |
alias mv='mv -i' |
11:40 |
|
kados |
I'm not sure then :/ |
11:40 |
|
paul |
alias cp='cp -i' |
11:40 |
|
paul |
[ -n $DISPLAY ] && { |
11:40 |
|
paul |
. /etc/profile.d/alias.sh |
11:40 |
|
paul |
} |
11:41 |
|
paul |
[ -z $INPUTRC ] && export INPUTRC=/etc/inputrc |
11:41 |
|
paul |
set $PATH=$PATH:/usr/local/kde/bin |
11:41 |
|
paul |
export PATH |
11:41 |
|
paul |
if [ -f /etc/bashrc ]; then |
11:41 |
|
paul |
. /etc/bashrc |
11:41 |
|
paul |
fi |
11:41 |
|
hdl |
you say I am not... But is it you keyboard or your locale that is not . |
11:41 |
|
paul |
my locale. |
11:41 |
|
paul |
locale told me fr_FR |
11:41 |
|
paul |
except after a su - that says fr_FR.UTF-8 |
11:42 |
|
kados |
strange indeed |
11:42 |
|
hdl |
Do you have loaded a keyboard or a system with your MCC ? |
11:43 |
|
hdl |
or in KDE control ? |
11:43 |
|
paul |
how can I check ? |
11:43 |
|
paul |
(you already told me but I don't remember) |
11:44 |
|
hdl |
(If I told you, I don't remember ;) ) |
11:44 |
|
hdl |
Configurer votre ordinateur et aller dans clavier. |
11:46 |
|
hdl |
Test in console mode Ctrl Alt F2 and check if locale is the same. |
11:47 |
|
hdl |
(Sh.....!!!) kados: I can't search my base again. |
11:47 |
|
kados |
hehe |
11:47 |
|
hdl |
(I rebuilded the stuff, once again today). |
11:48 |
|
paul |
Configurer votre ordinateur et aller dans clavier. ==>>> I don't see what you mean |
11:53 |
|
hdl |
paul : dans matériel/disposition du clavier. |
11:53 |
|
hdl |
Il y a aussi les paramètres d'accessibilité KDE. |
11:54 |
|
paul |
disposition du clavier => configuration du clavier tu veux dire ? |
11:54 |
|
paul |
et je choisis quoi ? |
11:58 |
|
thd |
kados: I know what happened for some issues of multiple authorities created where there should have only been one. |
11:58 |
|
kados |
thd: you found such instances? |
11:58 |
|
hdl |
Je n'ai rien vu de très probant. |
11:58 |
|
thd |
kados: look at William Faulkner. |
11:59 |
|
hdl |
J'ai eu à le faire moi-même à la main. dans le fichier /etc/sysconfig/keyboard |
11:59 |
|
hdl |
Normalement, français. |
12:00 |
|
thd |
kados: Only half of those records used authority control for 100. |
12:01 |
|
thd |
kados: Some differences are because of the presence or absence of a full stop at the end of the field. |
12:02 |
|
kados |
thd: I'm currently working on some utf-8 probs in 3.0 |
12:02 |
|
thd |
kados: The values are not being normalised before the comparison is made. |
12:02 |
|
kados |
thd: I hope to have some more time to work on authorities this afternoon |
12:03 |
|
thd |
kados: What I do not know is why bib records show as 0 |
12:03 |
|
kados |
thd: in the meantime, if you're finished with the MARC framework, you could start compiling a list of problems with the import :-) |
12:03 |
|
kados |
thd: so we can refine it :-) |
12:03 |
|
kados |
thd: I suspect that's a template prob ... I noticed it as well |
12:04 |
|
thd |
kados: by import you mean authority building? |
12:04 |
|
kados |
yep |
12:06 |
|
pierrick_ |
As I suspected, in Koha 2.2 borrowers search, searching "anaë" returns "anaël" and "anaes". I might be smart, but it's false... |
12:08 |
|
pierrick_ |
kados: searching an accentuated name returns unaccentuated nams |
12:09 |
|
kados |
and I suspect that's mysql being clever :-) |
12:09 |
|
kados |
you can probably turn off this feature if you don't like it |
12:09 |
|
kados |
you'd have to check the manual though |
12:09 |
|
paul |
pierrick_ : who told you it was false ? |
12:10 |
|
paul |
because from a librarian point of view it's exactly what they want ! |
12:10 |
|
paul |
otherwise, a forgotten or wrong accent would make many data disappear ! |
12:10 |
|
kados |
so here is what I have learned so far about utf-8 and perl |
12:11 |
|
paul |
wait a little |
12:11 |
|
kados |
earlier versions of perl (before 5.6) did not distinguish between a byte and a character |
12:11 |
|
kados |
paul: ok |
12:12 |
|
hdl |
kados : can I try and make a summary of operations needed to get a zebra base. |
12:12 |
|
kados |
please do |
12:12 |
|
hdl |
1) modify zebracfg according to your US one. |
12:13 |
|
hdl |
And create tmp, shadow, lock directories. |
12:14 |
|
hdl |
then zebraidx create Nameofyourzebrabase (define in the /etc/koha.conf) |
12:14 |
|
hdl |
zebraidx commit |
12:14 |
|
hdl |
paul : marche mieux ? |
12:14 |
|
paul |
still not utf-8 when logged as paul |
12:14 |
|
paul |
:-( |
12:15 |
|
hdl |
3) zebrasrv localhost:2100/nameofyourbase |
12:15 |
|
pierrick_ |
paul: it's false on a general point of view because it become impossible to search anaël and not anaes |
12:16 |
|
pierrick_ |
but it might be a MySQL cleverness. I thought it was a Koha feature |
12:16 |
|
paul |
yes, but WHO want to search anaës and not anaes ? |
12:16 |
|
paul |
(in real life I mean) |
12:17 |
|
kados |
hdl: looks correct |
12:17 |
|
hdl |
4) launch rebuild_zebra.pl -c (on an updated base) wait and wait and wait.... |
12:17 |
|
pierrick_ |
we're are talking about a stupid simple example, imagine something like a chinese character^W ideogram search |
12:17 |
|
kados |
pierrick_: the problem is, what if I want to search for that but don't have the correct keyboard? |
12:17 |
|
hdl |
5) zebraidx commit |
12:17 |
|
hdl |
(commit all the stuff) |
12:17 |
|
kados |
hdl: zebraidx commit should not be necessary |
12:18 |
|
pierrick_ |
kados: this is the reason why many applications offers a table of characters |
12:18 |
|
kados |
hdl: unless rebuild_zebra.pl does not use the correct subroutine in Biblio.pm to connect to z3950 extended services |
12:18 |
|
hdl |
kados: Yet it seems to be. |
12:18 |
|
kados |
hdl: it should use the z3950_extended_services() routine in the same way that bulkmarcimport.pl does |
12:18 |
|
hdl |
unless you commited a fix very recently. |
12:19 |
|
pierrick_ |
(but my "problem" is really not important at all, I admit) |
12:19 |
|
kados |
hdl: i haven't had time to ... |
12:19 |
|
hdl |
shadow get full and no .mf files... |
12:20 |
|
kados |
maybe 4 gig isn't big enough? |
12:20 |
|
paul |
OK, I have to stop working on utf-8 atm |
12:20 |
|
kados |
paul: shall I explain what I have learned about utf-8? |
12:20 |
|
paul |
for sure ! |
12:20 |
|
paul |
(i'm still here, but answering an RFP) |
12:20 |
|
kados |
ok |
12:20 |
|
kados |
earlier versions of perl (before 5.6) did not distinguish between a byte and a character |
12:21 |
|
kados |
which is a major problem for unicode of course |
12:21 |
|
kados |
in perl 5.6 they wanted to: |
12:21 |
|
kados |
1. not break old byte-based programs |
12:22 |
|
kados |
when they were using byte-based characters |
12:22 |
|
kados |
2. allow byte-based programs to use character-based characters 'magically' |
12:23 |
|
kados |
to do this, perl uses bytes by default (at least in 5.6) |
12:23 |
|
kados |
to use character-based you must 'mark' the character-based interfaces so that perl knows to expect chracter-oriented data |
12:24 |
|
kados |
when they have been so marked, perl will convert all byte-based characters to utf-8 |
12:26 |
|
kados |
the bottom line is, we must explicitly tell perl we are working with utf-8 |
12:26 |
|
Sylvain |
Hi all |
12:27 |
|
paul |
another frenchy ! |
12:27 |
|
paul |
kados : you're right. |
12:27 |
|
kados |
Input and Output Layers |
12:27 |
|
kados |
Perl knows when a filehandle uses Perl's internal Unicode encodings (UTF-8, or UTF-EBCDIC if in EBCDIC) if the filehandle is opened with the ":utf8" |
12:27 |
|
kados |
layer. Other encodings can be converted to Perl's encoding on input or from Perl's encoding on output by use of the ":encoding(...)" layer. See |
12:27 |
|
kados |
open. |
12:27 |
|
paul |
and the Encode:decode does this for variables. |
12:27 |
|
kados |
yep |
12:27 |
|
kados |
I see no way around this |
12:27 |
|
paul |
and DBD::mySQL returns a "non flagged" results. |
12:28 |
|
kados |
so we must again set the flag |
12:28 |
|
paul |
as, for Tümer everything is OK, I was suspecting he had something that could explain this. |
12:28 |
|
paul |
it can be the utf8 config of it's server (+ he's under windows) |
12:28 |
|
kados |
are you sure he is testing with HEAD? |
12:28 |
|
paul |
no. |
12:28 |
|
kados |
because I could also say 'everything is OK' on my server |
12:29 |
|
paul |
in fact I think he's working with 2.2 but I could be wrong. |
12:29 |
|
paul |
pierrick_ and Sylvain : introduce yourself |
12:29 |
|
kados |
in that case, I suspect that as soon as he re-encodes mysql as utf-8 as in HEAD he will have the same problems we have |
12:29 |
|
paul |
:-( |
12:29 |
|
paul |
could you ask him ? |
12:29 |
|
kados |
sure |
12:30 |
|
paul |
I had a solutions that seemed to work : |
12:30 |
|
paul |
I installed mysqlPP driver and hacked it a little. |
12:30 |
|
paul |
it's a Pure Perl mysql driver. |
12:30 |
|
paul |
it worked it seems. I just Encode everything coming from mysql socket |
12:30 |
|
kados |
interesting |
12:31 |
|
kados |
paul: you will hate me to say this: what about switching to postgres? :-) |
12:31 |
|
paul |
no answer from mysql ? |
12:31 |
|
kados |
no answer from mysql |
12:31 |
|
paul |
I won't |
12:31 |
|
paul |
I just will say : why not, but that's a huge task ! |
12:31 |
|
kados |
does the DBD::Postgres driver have the same problems? |
12:31 |
|
paul |
(+ complex for existing libraries) |
12:31 |
|
paul |
no. |
12:32 |
|
paul |
in fact the fix for mysql you can find on the net is a port from the fix for Postgres ! |
12:32 |
|
kados |
paul: how many hours do you estimate it would take? |
12:32 |
|
kados |
(I have been working with postgres with Evergreen and I must say it is much nicer than mysql) |
12:34 |
|
kados |
(though much harder to use) |
12:35 |
|
kados |
paul: would switching to postgres be harder than putting in 'Encode' everywhere? |
12:35 |
|
kados |
paul: in your opinion? |
12:35 |
|
paul |
yes because adding Encode is a boring but trivial task |
12:35 |
|
paul |
whereas switching to Postgres will make some problems with DB structure & management |
12:36 |
|
kados |
ahh |
12:36 |
|
kados |
pierrick_: you have postgres experience, right? |
12:36 |
|
kados |
pierrick_: what is your opinion? |
12:36 |
|
kados |
morning owen |
12:36 |
|
owen |
Hi |
12:36 |
|
kados |
owen: the patron images stuff looks nice :-) |
12:37 |
|
hdl |
Under Windows, Mysql and apache may be utf8 by default. |
12:37 |
|
kados |
in case noone has seen it: |
12:37 |
|
kados |
http://koha.liblime.com/cgi-bi[…]dborrower=0054313 |
12:37 |
|
kados |
owen has created a very nice patronimages option in the NPL templates |
12:38 |
|
kados |
(of course, my pic is not avaiable :-)) |
12:38 |
|
paul |
aren't you here joshua ? |
12:38 |
|
paul |
http://www.paulpoulain.com/pho[…]img_0061.jpg.html |
12:38 |
|
paul |
;-) |
12:39 |
|
paul |
lol |
12:39 |
|
kados |
hehe ... yes ... with long hair even :-) |
12:39 |
|
paul |
(still long ?) |
12:39 |
|
kados |
owen: hehe |
12:39 |
|
kados |
paul: no ... quite bald now :-) |
12:39 |
|
paul |
bald ? |
12:39 |
|
hdl |
Some information ar not well displayed (on the right of the picture |
12:39 |
|
kados |
paul: I shaved my head with a razer about a week ago :-) |
12:39 |
|
paul |
(same for me -konqueror-) |
12:39 |
|
paul |
wow ! |
12:40 |
|
kados |
paul: but the long hair was cut some months ago ... about 6 months in fact |
12:40 |
|
kados |
paul: right before LibLime's first conference :-) |
12:40 |
|
paul |
you're like a business man then now ? |
12:40 |
|
pierrick_ |
(I'm back... sorry kados, you asked a question, I'm going to answer) |
12:40 |
|
kados |
almost :-) |
12:42 |
|
hdl |
When I tried to install Koha on a window box, data had to be utf-8. |
12:42 |
|
hdl |
c< |
12:43 |
|
owen |
kados: I think your liblime color stylesheet is missing some CSS relating to the patron image. That might be why it's getting overlaid by the patron details |
12:45 |
|
pierrick_ |
So, my opinion about PostgreSQL ? |
12:46 |
|
pierrick_ |
If Koha uses MySQL InnoDB as table engine and utf8 as charset, I would say that it's worth switching to PostgreSQL |
12:47 |
|
pierrick_ |
my PostrgreSQL experience is quite old in fact, I was working on it in 2002 on a Java CMS |
12:48 |
|
pierrick_ |
my internship was about making the CMS talking to MySQL or Oracle or PostgreSQL. In unicode because the customer was asian |
12:49 |
|
pierrick_ |
and if I remember well, it was quite easy in fact |
12:55 |
|
pierrick_ |
but I can't believe we can make Koha work in full UTF-8 using same technnologies (Perl and MySQL) as in 2.2 |
12:55 |
|
kados |
right |
12:55 |
|
kados |
the more I think about it the more I like the idea |
12:56 |
|
pierrick_ |
sorry, I wanted to say "we can't make" |
12:56 |
|
kados |
I think we need to proceed carefully though |
12:56 |
|
kados |
pierrick_: (i understood) |
12:57 |
|
pierrick_ |
my front co-worker tells me PosgreSQL in UTF8 is not working very well under Windows |
12:58 |
|
kados |
interesting |
12:59 |
|
kados |
this is a true dilemma then :-) |
12:59 |
|
paul |
the last possibility being to stay with our 2x1 000 000 problem. |
12:59 |
|
pierrick_ |
because PostgreSQL charset is based on system locale... and under Windows, you only have foo1252 |
12:59 |
|
paul |
* keep mysql collate NOT in utf8 |
12:59 |
|
paul |
as in 2.2 ! |
13:00 |
|
paul |
to test : |
13:00 |
|
kados |
but I think eventually we will need to fix the underlying problem |
13:00 |
|
paul |
* get a 2.2 working |
13:00 |
|
paul |
* comment the utf8 move in updatedatabase |
13:00 |
|
paul |
* updatedatabase & see if it work |
13:00 |
|
kados |
right |
13:01 |
|
pierrick_ |
and change the HTML headers... |
13:01 |
|
paul |
(that's already done in head pierrick_) |
13:01 |
|
paul |
(in PROG templates) |
13:03 |
|
kados |
brb |
13:03 |
|
pierrick_ |
paul: in PROG template, I read charset=utf-8 hardcoded |
13:03 |
|
paul |
yep. |
13:03 |
|
paul |
that's what we want. |
13:04 |
|
paul |
it seems that utf8 works better if : |
13:04 |
|
paul |
* we keep mysql in iso |
13:04 |
|
pierrick_ |
if your data are not stored in UTF-8, you'll have display problems |
13:04 |
|
paul |
* we do nothin in Perl |
13:04 |
|
paul |
that's what I call the 2x1 million $ problem : |
13:04 |
|
pierrick_ |
when do you convert from iso to utf8 for display ? |
13:04 |
|
paul |
the result is 0, as expected. |
13:04 |
|
paul |
but hides 2 problems. |
13:05 |
|
paul |
I don't know. I just see that it work under 2.2 and for Tümer in Turkey ! |
13:05 |
|
paul |
it's dangerous to get something working through 2 things not working, but that's the only solution I see atm |
13:06 |
|
pierrick_ |
"ça tombe en marche" (sorry kados, don't know how to translate) |
13:06 |
|
pierrick_ |
I hate not understanding :-/ |
13:06 |
|
paul |
me too. that's why I tried to understand. |
13:07 |
|
paul |
exactly : "ca tombe en marche" |
13:07 |
|
pierrick_ |
is there a mail on the mailing list explaining clearly the initial problem ? |
13:08 |
|
paul |
no, there is a collection of mails. |
13:08 |
|
pierrick_ |
I thought "set names 'UTF8';" was a the solution |
13:08 |
|
pierrick_ |
(once database correctly converted to utf8) |
13:09 |
|
paul |
I thought too. |
13:09 |
|
paul |
that's why i added it to Auth.pm |
13:11 |
|
pierrick_ |
Auth.pm means authorities ? Why not in Context.pm where the database connection is made ? |
13:12 |
|
paul |
you're right. |
13:13 |
|
paul |
sorry |
13:13 |
|
paul |
(it was to see if you were following us ;-) ) |
13:13 |
|
pierrick_ |
hehe |
13:15 |
|
pierrick_ |
from where do I re-read IRC and koha-devel to summarize the "MySQL, Perl and UTF-8 issue", I will summarize it on koha-devel |
13:20 |
|
kados |
pierrick_: list archives are on savannah ... but google will find them better for you |
13:20 |
|
kados |
koha.org/irc is the irc log |
13:21 |
|
pierrick_ |
s{from where}{since when}g |
13:24 |
|
hdl |
pierrick_: (I sent you them via email) |
13:24 |
|
hdl |
(filter on utf in devel list.) |
13:26 |
|
pierrick_ |
thank you hdl :-) |
14:01 |
|
pierrick_ |
(just to say in the wind : how easy it was to convert and use UTF8 with Java/Oracle, that's what we made in my previous job... but making C talk with Oracle UTF8 was hard... and I hate Oracle anyway) |
14:08 |
|
kados |
paul: are you still here? |
14:08 |
|
kados |
paul: do your clients use the 'issuingrules' aspect of Koha? |
14:08 |
|
paul |
(on phone) |
14:08 |
|
kados |
paul: it's been broken for several versions now |
14:10 |
|
hdl |
Not for us. |
14:10 |
|
kados |
The biggest bug is that filling in values in the * column doesn't work--it should set default values all patron types, but it doesn't. |
14:11 |
|
kados |
also, if values are left out, there are no hardcoded defaults |
14:11 |
|
kados |
issuing will just fail |
14:12 |
|
kados |
hdl: do your clients not experience this behavior? |
14:12 |
|
Sylvain |
kados I've got problems with issuing rules and empty cells |
14:12 |
|
hdl |
They fil in all the cells ;) |
14:12 |
|
Sylvain |
generating null values in issuingrules table |
14:12 |
|
hdl |
except fees. |
14:13 |
|
kados |
so in my view, something is broken if it doesn't work the way it says it works :-) |
14:13 |
|
Sylvain |
I agree that it doens't work :) |
14:13 |
|
kados |
and issuingrules have been broken for several versions :-) |
14:13 |
|
Sylvain |
I think I had done a patch, have do search |
14:13 |
|
kados |
it is seemingly small problems like this that give us a bad name |
14:13 |
|
kados |
it makes Koha appear buggy |
14:14 |
|
kados |
Sylvain: that'd be great! |
14:15 |
|
hdl |
Yes, but default value could also be a syspref, so that ppl could give it for once, and 21,5 is only an example. |
14:15 |
|
kados |
hdl: is there a syspref for this? |
14:16 |
|
hdl |
Not yet. But If Sylvain sends his patch, I could get it worl. |
14:16 |
|
hdl |
s/worl/work |
14:16 |
|
hdl |
And of course, if it is needed. |
14:18 |
|
Sylvain |
in admin/issuingrules.pl there's a line with a # which is if ($maxissueqty > 0) |
14:18 |
|
Sylvain |
I've rempalced it by : |
14:18 |
|
Sylvain |
if (($issuelength ne '') and ($maxissueqty ne ''))^M |
14:18 |
|
Sylvain |
{ |
14:18 |
|
Sylvain |
and for me it works |
14:19 |
|
kados |
Sylvain: I'll try this |
14:20 |
|
kados |
Sylvain: what line? |
14:20 |
|
kados |
ahh ... nevermind |
14:20 |
|
Sylvain |
but it's not tested a lot :) |
14:20 |
|
kados |
Sylvain: I see two instances of this |
14:21 |
|
kados |
Sylvain: did you replace both? |
14:21 |
|
Sylvain |
line 69 only |
14:22 |
|
Sylvain |
but maybe the second one creates another problem |
14:22 |
|
Sylvain |
as far as I remeber this change removed the pb with null values |
14:26 |
|
paul_away |
see you on monday, for a new week of Koha hack ! |
14:26 |
|
pierrick_ |
have a good long WE paul :-) |
14:27 |
|
paul_away |
(i'm with a customer tomorrow. not WE !) |
14:27 |
|
paul_away |
(Ouest Provence to say everything...) |
14:28 |
|
pierrick_ |
oh OK, enjoy your 50kms trip :-) |
14:28 |
|
kados |
bye paul_away |
14:28 |
|
pierrick_ |
kados: should I say "journey" or "trip"? |
14:29 |
|
kados |
trip I think |
14:29 |
|
kados |
journey is a bit archaic |
14:29 |
|
pierrick_ |
thanks |
14:37 |
|
pierrick_ |
hdl: I've finished reading the mails you forwar |
14:37 |
|
pierrick_ |
ded to me and associated web links |
14:37 |
|
pierrick_ |
Paul has already done a deep investigation |
14:38 |
|
pierrick_ |
I have to read IRC log in details to understand what remains problematic, but I'll do it tomorrow morning |
14:38 |
|
pierrick_ |
I'm going back home now, diner outside |
15:05 |
|
thd |
owen: why does stylesheet have no file name in rel_2_2 ? |
15:05 |
|
owen |
I'm not sure what you mean |
15:06 |
|
thd |
<link rel="stylesheet" type="text/css" href="/opac-tmpl/npl/en/includes/" /> |
15:06 |
|
thd |
<style type="text/css"> |
15:07 |
|
thd |
@import url(/opac-tmpl/npl/en/includes/); |
15:07 |
|
thd |
</style> |
15:07 |
|
owen |
There are new system preferences for defining those stylesheets |
15:07 |
|
thd |
owen: oh, yes I looked for that but did not see them |
15:07 |
|
owen |
We need to handle this better somehow in the case of new installations, I think |
15:08 |
|
thd |
owen: what are the preferences called? |
15:08 |
|
owen |
for the default, use opac.css for opaclayoutstylesheet |
15:08 |
|
owen |
and colors.css for opaccolorstylesheet |
15:08 |
|
thd |
I see them |
15:09 |
|
thd |
owen: I had a mistaken pathname in my update script recently and had been missing changes which I have only just seen today |
15:10 |
|
thd |
owen: I assumed the changes were there but not working as I had expected. |
15:10 |
|
thd |
thank you owen |
16:32 |
|
owen |
kados: you around? |
16:39 |
|
kados |
owen: yea ... kinda |
16:39 |
|
kados |
what's up? |
16:40 |
|
owen |
http://koha.liblime.com/cgi-bi[…]uest.pl?bib=18398 |
16:40 |
|
owen |
Is it even possible with Koha now to have more than one checkbox in that list of items? |
16:40 |
|
owen |
Is it still possible to have more than one itemtype attached to one biblio? |
16:41 |
|
kados |
this is the tricky bit |
16:41 |
|
kados |
yes, strictly speaking, you can have more than one itemtype attached to a biblio |
16:42 |
|
kados |
however, this behavior isn't supported in the MARC21 version of KOha |
16:42 |
|
kados |
it is in the non-MARC and the UNIMARC |
16:42 |
|
kados |
I'm not sure why we got jipped |
16:42 |
|
kados |
it's one of those things on my list to check out |
16:42 |
|
kados |
so ... thanks for reminding me :-) |
16:43 |
|
owen |
I'm just trying to figure out whether we still need the option to choose an item type when making a reserve (it's one of the things hidden in NPL's production template). So it's just us that can't use it. |
16:43 |
|
thd |
kados: that is not supported on any MARC version of Koha |
16:45 |
|
thd |
owen: there is a workaround that requires a lot of work to setup but unfortunately I do not have time to explain it to you at the moment |
16:45 |
|
owen |
No problem. I'm just trying to clean up the templates where possible (answer: not here) |
16:45 |
|
kados |
it's a major flaw in the current design |
16:46 |
|
kados |
I'd leave it in for now ... hopefully we can fix it |
16:46 |
|
owen |
Did you create a new syspref to show/hide the reading record? |
16:47 |
|
owen |
Oh... opacreadinghistory. |
16:47 |
|
owen |
Can that be used in the intranet too? |
16:54 |
|
owen |
I guess so |
16:55 |
|
kados |
yep it can ... needs support in the templates, that's all |
16:55 |
|
kados |
but might make more sense to have a separate one for the intranet |
16:55 |
|
owen |
So I can hide the link to the reading record page. Should I disable the display of reading history within the reading record page itself? |
16:56 |
|
owen |
i.e. "this page has been disabled by your administrator" |
16:56 |
|
kados |
yea |
17:30 |
|
thd |
kados: are you there? |
17:30 |
|
kados |
thd: kinda |
17:45 |
|
thd |
kados: now that I identified that problem I know there will be a need for a routine to translate the illegal ISO-8859 records people have into UTF-8 |
17:46 |
|
thd |
kados: That will be a little tricky because the leader will always claim the encoding is in another character set |
17:48 |
|
thd |
s/another/a legal/ |
17:49 |
|
kados |
I didn't think that anyone would be so foolist |
17:49 |
|
kados |
foolish :-) |
17:49 |
|
kados |
as to create MARC in iso-8859 :) |
17:51 |
|
thd |
kados: In fact all of paul's customers may have that problem |
17:53 |
|
thd |
kados: BNF can export records in the illegal ISO-8859-1 character set while the encoding still shows ISO-5426 |
17:53 |
|
kados |
strange |
17:53 |
|
kados |
if you can't rely on the leader there's no way I can think of to auto-sense what charset you're working with |
17:54 |
|
thd |
kados: that is a great help for systems that are UTF-8 challenged |
17:54 |
|
thd |
kados: However, it is an additional problem for migration to UTF-8 |
17:56 |
|
thd |
kados: The solution is not especially difficult even when the record encoding value does not match the actual encoding |
17:56 |
|
thd |
kados: I have seen code that tests for question marks and then is able to try to guess the encoding. |
18:45 |
|
thd |
kados: I should not have looked now but the source records were MARC-8 and they have been translated into UTF-8, although nothing changed the leader encoding for 000/09. |
18:46 |
|
kados |
interesting |
18:46 |
|
kados |
who translated them? |
18:46 |
|
thd |
kados: However my problem is probably that Perl refuses to send them to Apache as UTF-8 without changing my locale |
18:46 |
|
thd |
kados: Koha translated them |
18:46 |
|
kados |
so you're doing original cataloging? |
18:47 |
|
kados |
it didn't change the leader to UTF-8? |
18:47 |
|
kados |
ahh ... what version of MARC::File::XML are you using? |
18:47 |
|
kados |
upgrade to 0.82 and test again |
18:47 |
|
thd |
kados: No these are records captured with my test YAZ/PHP Z39.50 client |
18:48 |
|
kados |
but you're manually copy/pasting them into the Koha editor? |
18:48 |
|
thd |
kados: I can confirm that the contents of the data in MySql is correctly encoded in UTF-8 |
18:49 |
|
thd |
kados: No I used bulkmarcimport.pl |
18:50 |
|
thd |
kados: Everything should at least look fine except that Perl is telling Apache that I am sending ISO-8859 |
18:51 |
|
thd |
kados: Yet PHP does not care what my locale is for sending the data to Apache correctly |
19:01 |
|
kados |
thd: you mean apache is telling perl that you're sending iso-8859 |
19:01 |
|
kados |
thd: I don't see how you can interact directly with perl on a browser |
19:04 |
|
thd |
kados: I am probably not describing it correctly but PHP web applications works fine for UTF-8 on my system but Perl would seem to be the problem. |
19:06 |
|
thd |
kados: The problem could be specific to Koha, I will have to test later if I create a UTF-8 page in Perl outside of Koha to see whether that works correctly. |
19:18 |
|
thd |
kados: I strongly suspect Perl generally because there is a design issue that prevents it from working with Unicode as flexibly as more recently introduced or recently modified languages deriving from its origin without any thought to multi-byte character sets. That was one thing that Perl 6 is intended to remedy. |
22:35 |
|
thd |
kados: Koha is responsible for sending the characters in conformance to my locale encoding using some feature of Perl most likely. This is what I had proposed to develop as part of a configurable page serving feature. MARC::Charset would not be required for that. |
22:43 |
|
thd |
kados: the data in the XHTML is ISO-8859 but the data in MySQL is UTF-8. Apache cannot be responsible. Apache is in fact using UTF-8 encoding as directed but the data is ISO-8859. |
23:35 |
|
kados |
thd: update your addbiblio.pl |
23:35 |
|
kados |
thd: I just committed a fix for your issue (I think) |
23:35 |
|
kados |
thd: also, if you're around, could you explain to me the different character sets that UNIMARC uses? |
23:35 |
|
thd |
ok updating now |
23:36 |
|
thd |
kados: It is many but at least full unicode is defined |
23:36 |
|
kados |
but I mean: what character sets (other than utf-8 and MARC-8) are UNIMARC records likely to be download in |
23:37 |
|
kados |
or uploaded into the reservoir as |
23:37 |
|
thd |
kados: no MARC-8 in UNIMARC |
23:37 |
|
kados |
just uft-8 then? |
23:39 |
|
thd |
kados: paul's users as I had said have been obtaining records encoded in ISO-8859 that should have been ISO-5426 |
23:40 |
|
kados |
that's really tricky |
23:40 |
|
kados |
anyway, did the fix work for you? |
23:42 |
|
thd |
kados: I interrupted the update to bring you |
23:43 |
|
thd |
UNIMARC 100 $a a fixed field defining the character sets in a manner similar to 000/09 in MARC 21 |
23:43 |
|
thd |
$a/26-29 Character Sets (Mandatory) |
23:43 |
|
kados |
just update the one file |
23:43 |
|
thd |
Two two-character codes designating the principal graphic character sets used in communication of the record. Positions 26-27 designate the G0 set and positions 28-29 designate the Gl set. If a Gl set is not needed, positions 28-29 contain blanks. For further explanation of character coding see Appendix J. The following two-character codes are to be used. They will be augmented as required. |
23:43 |
|
thd |
01 = ISO 646, IRV version (basic Latin set) |
23:43 |
|
thd |
02 = ISO Registration # 37 (basic Cyrillic set) |
23:43 |
|
thd |
03 = ISO 5426 (extended Latin set) |
23:43 |
|
thd |
04 = ISO DIS 5427 (extended Cyrillic set) |
23:43 |
|
thd |
05 = ISO 5428 (Greek set) |
23:43 |
|
thd |
06 = ISO 6438 (African coded character set) |
23:43 |
|
thd |
07 = ISO 10586 (Georgian set) |
23:43 |
|
thd |
08 = ISO 8957 (Hebrew set) Table 1 |
23:43 |
|
thd |
09 = ISO 8957 (Hebrew set) Table 2 |
23:43 |
|
thd |
10 = [Reserved] |
23:43 |
|
thd |
11 = ISO 5426-2 (Latin characters used in minor European languages and obsolete typography) |
23:43 |
|
thd |
50 = ISO 10646 Level 3 (Unicode) |
23:44 |
|
kados |
cvs update addbiblio.pl |
23:44 |
|
thd |
kados: I know but I have to put it in the right place so this is faster |
23:44 |
|
kados |
shit ... that's a lot of encodings |
23:44 |
|
kados |
I note that 8859's not on that list |
23:45 |
|
thd |
kados: French users only have to worry about ASCII and ISO-5426 |
23:45 |
|
kados |
well, so Koha's far from supporting UNIMARC |
23:45 |
|
kados |
at least in terms of encoding |
23:45 |
|
kados |
MARC::Charset only knows how to deal with MARC-8 and UTF-8 |
23:45 |
|
kados |
so UNIMARC's in trouble :-) |
23:46 |
|
thd |
kados: there is MAB::Encode or whatever it is called for ISO-5426 |
23:46 |
|
kados |
thd: not ascii according to the list you posted |
23:46 |
|
kados |
thd: ascii _is_ iso-8859 |
23:47 |
|
thd |
kados ASCII is in there as an ISO standard |
23:48 |
|
thd |
kados: ISO 8859 has Latin characters past 128 which makes it more than ASCII |
23:48 |
|
kados |
thd: well ... I guess ASCII is a subset of 8859 |
23:48 |
|
kados |
righ |
23:48 |
|
kados |
t |
23:49 |
|
thd |
ASCII is ISO-646 |
23:49 |
|
kados |
thd: where's MAB::Encode? |
23:49 |
|
thd |
kados: CPAN |
23:49 |
|
kados |
don't see it |
23:49 |
|
thd |
kados: It is described as Alpha but I have never seen actual problem reports |
23:50 |
|
thd |
not that I really looked |
23:50 |
|
kados |
ahh ... Encode::MAB |
23:51 |
|
thd |
http://search.cpan.org/~andk/M[…]ib/Encode/MAB2.pm |
23:52 |
|
kados |
thd: so I check character position 26-27 and 28-28 in UNIMARC, if 26-27 are set to '01' I should use Encode::MAB2? |
23:52 |
|
kados |
to fix the encoding? |
23:53 |
|
kados |
hmmm ... it can't go to utf-8 |
23:53 |
|
thd |
kdaos: Except that BNF does not change that when sending records in ISO-8859-1 |
23:53 |
|
kados |
yowser |
23:53 |
|
thd |
kados: They can hardly set it to an undefined value |
23:54 |
|
kados |
there's really nothing that can be done about that |
23:54 |
|
thd |
kados: yes in fact there may be a check |
23:54 |
|
kados |
we can't be expected to detect the record encoding |
23:55 |
|
thd |
kados: I posted before about guessing the encoding and then checking to see if an error is produced after temporary parsing |
23:55 |
|
kados |
heh |
23:55 |
|
thd |
kados: I have seen routines that essentially search for question marks where they should not appear. |
23:56 |
|
kados |
hehehe |
23:56 |
|
kados |
you're nuts :-) |
23:56 |
|
thd |
kados: I assume those methods are not foolproof |
23:57 |
|
kados |
well ... it might be a good customization |
23:57 |
|
kados |
for special cases |
23:57 |
|
kados |
I certainly wouldn't want to have that be default behavour |
23:59 |
|
thd |
kados: I tried one of those methods in PHP just for fun but I was not feeding it good data |
23:59 |
|
kados |
thd: did my fix work for you? |
00:00 |
|
thd |
kados: not with existing data I am using bulmarcimport.pl -d right now |
00:01 |
|
thd |
kados: who has been partially normalising the searches? |
00:01 |
|
kados |
ahh |
00:01 |
|
kados |
normalizing? |
00:02 |
|
kados |
so you're saying that rel_2_2 doesn't currently handle imports from bulkmarcimport ? |
00:02 |
|
kados |
well ... encoding that is? |
00:02 |
|
kados |
could you send me the records you're importing |
00:02 |
|
kados |
so i can fix it? |
00:02 |
|
kados |
thd: ? |
00:03 |
|
thd |
kados: I found before records that I should not have found when searching Cezanne instead of C?zanne |
00:03 |
|
kados |
that's mysql being smart |
00:03 |
|
kados |
is that your only problem? |
00:04 |
|
thd |
kados: no that would be a benefit if I could find records by searching with C?zanne as well |
00:05 |
|
kados |
thd: send me the records |
00:05 |
|
thd |
kados: the correct diacritics failed even when I suppled the UTF-8 string |
00:05 |
|
kados |
thd: I'll fix it :-) |
00:06 |
|
thd |
kados: Let me get a better copy of the records they are full of redundancies from multiple targets making it difficult to search them for one |
00:06 |
|
kados |
thd: but I don't have much time tonight, so you better make it quick |
00:06 |
|
thd |
kados: I will be fast |
00:07 |
|
thd |
nothing has been fixed after re importing the records. |
00:08 |
|
kados |
thd: I have a fix for you |
00:08 |
|
kados |
open bulkmarcimport.pl |
00:08 |
|
thd |
kados: ok one moment |
00:10 |
|
thd |
open |
00:10 |
|
kados |
thd: line 79 |
00:10 |
|
kados |
add the following after it: |
00:10 |
|
kados |
my $uxml = $record->as_xml; |
00:10 |
|
kados |
$record = MARC::Record::new_from_xml($uxml, 'UTF-8'); |
00:10 |
|
kados |
try that |
00:11 |
|
thd |
while ( my $record = $batch->next() ) { |
00:11 |
|
kados |
yep ... right after that line |
00:11 |
|
thd |
that is line 79 |
00:11 |
|
kados |
while ( my $record = $batch->next() ) { |
00:11 |
|
kados |
my $uxml = $record->as_xml; |
00:11 |
|
kados |
$record = MARC::Record::new_from_xml($uxml, 'UTF-8'); |
00:11 |
|
kados |
save the file |
00:11 |
|
kados |
and re-import |
00:12 |
|
kados |
see if that fixes it |
00:13 |
|
thd |
reimporting now |
00:16 |
|
thd |
kados: that did something wired |
00:16 |
|
thd |
s/wired/weired/ |
00:18 |
|
thd |
kados: I have the accent on character over now C?zanne is now Ce\x{017a}anne |
00:19 |
|
thd |
kados: I think that did not go through right but it looks right with the acute accent except on the wrong character |
00:20 |
|
thd |
kados: My spell checker corrupted my post |
00:20 |
|
kados |
heh |
00:20 |
|
kados |
so what'd it do? |
00:20 |
|
kados |
can I look at it? |
00:23 |
|
kados |
thd: ? |
00:25 |
|
kados |
thd: these are all marc-8 ... or at least claim to be |
00:27 |
|
kados |
http://opac.liblime.com/cgi-bi[…]tail.pl?bib=23783 |
00:27 |
|
kados |
I can search on Ce�zanne as well |
00:37 |
|
thd |
kados: Ce\x{0301}zanne |
00:37 |
|
thd |
hehe spell checker corruption again |
00:38 |
|
thd |
kados: my YAZ/PHP client page is in UTF-8 now |
00:39 |
|
thd |
and was for the past few months |
00:39 |
|
thd |
kados: It saves the records in raw encoding which is MARC-8 for those 5 |
00:40 |
|
thd |
kados: what is with the wandering accent though |
00:42 |
|
kados |
no idea |
00:42 |
|
kados |
it's really weird |
00:42 |
|
thd |
kados: before your fix for bulkmarcimport.pl I had consistent ISO-8859 content in the Koha XHTML page with a UTF-8 header |
00:42 |
|
kados |
it's also really weird that we can search for it :-) |
00:43 |
|
kados |
now what do you have? |
00:43 |
|
thd |
kados: It would have looked fine if the header had been ISO-8859 |
00:44 |
|
kados |
thd: I need some clarity |
00:44 |
|
kados |
thd: are these records MARC-8 or ISO-8859? |
00:44 |
|
thd |
kados: now I have wandering accents which are clearly UTF-8 but Koha keeps changing the charcters depending on the type of view |
00:44 |
|
kados |
thd: or don't you know? |
00:44 |
|
kados |
thd: really? |
00:44 |
|
kados |
thd: what's an example? |
00:45 |
|
thd |
kados: the records were in MARC-8 before import into Koha |
00:45 |
|
thd |
kados: although I did not check each one |
00:45 |
|
kados |
ok ... so they were converted to utf-8 |
00:45 |
|
kados |
and now they are in utf-8 |
00:45 |
|
kados |
are the leaders correct? |
00:45 |
|
thd |
kados: yes UTF-8 strangeness |
00:46 |
|
kados |
leaders are correct |
00:46 |
|
kados |
thd: check the MARC view |
00:46 |
|
thd |
kados: leaders were not correct before I will check now |
00:46 |
|
kados |
thd: the accent's in the right place |
00:46 |
|
kados |
http://opac.liblime.com/cgi-bi[…]tail.pl?bib=23783 |
00:46 |
|
kados |
02131cam a2200409 a 4500 |
00:46 |
|
kados |
looks right to me |
00:47 |
|
kados |
also, in the MARC view the accent is in the right place! :-) |
00:47 |
|
kados |
wtf |
00:49 |
|
thd |
kados: was your addBiblio.pm fix for the leader? |
00:51 |
|
kados |
the leaders for records going in to koha should be automatically fixed now from bulkmarcimport.pl and addbiblio.pl |
00:51 |
|
thd |
kados: did you see the mailing list post maybe on koha-devel about Koha UTF-8 code causing problem in Portuguese |
00:51 |
|
kados |
(and I'm about to fix the leader 'length' setting too) |
00:52 |
|
kados |
recent? |
00:52 |
|
thd |
kados: it was a few weeks ago |
00:52 |
|
kados |
I missed it |
00:52 |
|
kados |
thd: let's focus on our issue |
00:53 |
|
kados |
thd: do you see that in the MARC view the accents are corect/ |
00:53 |
|
thd |
kados: I did not pay close attention as it was just one record in Portuguese |
00:53 |
|
kados |
? |
00:53 |
|
kados |
correct even? |
00:53 |
|
thd |
s/record/letter/ |
00:53 |
|
thd |
yes correct |
00:53 |
|
kados |
I bet I know why |
00:53 |
|
kados |
the 'normal' view is pulling the records from the koha tables |
00:54 |
|
kados |
the marc view from the marc* tables |
00:54 |
|
thd |
Oh and the tables are not UTF-8? |
00:54 |
|
kados |
none of the tables are utf-8 |
00:54 |
|
kados |
:-) |
00:54 |
|
kados |
I |
00:54 |
|
thd |
s/tables/original Koha tables/ |
00:55 |
|
kados |
none of the tables in koha 2.2 are utf-8 |
00:55 |
|
thd |
kados: my Koha MARC tables have been UTF-9 for months |
00:55 |
|
thd |
s/9/8/ |
00:56 |
|
kados |
new theory |
00:56 |
|
kados |
the accent only shifts if the text is a link |
00:57 |
|
thd |
kados: I know that the values were correct previously before your fixes in marc_subfield_table |
00:57 |
|
kados |
thd: seem right to you? |
00:57 |
|
thd |
kados: yes |
00:57 |
|
thd |
s/subfields/ |
00:57 |
|
thd |
s/subfield/subfields/ |
00:58 |
|
thd |
kados: that is where the MARC record data live |
01:00 |
|
thd |
MySQL does not know the difference between UTF-9 and ISO-8859 except in search indexing |
01:00 |
|
thd |
s/9/8/ |
01:01 |
|
kados |
thd: so have we solved all your problems? |
01:01 |
|
kados |
except the strange fact that links shift the accents (which I bet is a browser problem) |
01:02 |
|
thd |
kados: If we convert the original Koha tables all will be fine and happy. |
01:02 |
|
kados |
convert? |
01:02 |
|
kados |
convert to what? |
01:03 |
|
thd |
ALTER TABLE whatever to UTF-8 |
01:03 |
|
kados |
no table in koha 2.2 is in utf-8 |
01:03 |
|
thd |
then reimport |
01:03 |
|
kados |
the marc tables aren't in utf-8 currently |
01:03 |
|
thd |
kados: Some are on my system |
01:03 |
|
kados |
do they work properly? |
01:04 |
|
thd |
kados: In fact I rebuilt the current Koha DB with UTF-8 default |
01:05 |
|
thd |
kados: actually everything should have been UTF-8 except for update changes from CVS |
01:06 |
|
thd |
kados: certainly marc_subfields_table with the MARC data had been fine |
01:06 |
|
thd |
kados: i had taken my original rel_2_2 dump and changed all the encodings from ISO-8859 to UTF-8 |
01:07 |
|
thd |
kados: then I imported that into a database built with UTF-8 defaults |
01:08 |
|
kados |
hmmm |
01:08 |
|
kados |
so what you're telling me is that mysql utf-8 works fine for you right? |
01:08 |
|
kados |
are all your tables utf-8? |
01:08 |
|
kados |
(how did you convert them?) |
01:09 |
|
thd |
kados: then I dropped the original ISO-8859 database and had been very happy except that I had confused the CVS update path for a few weeks |
01:10 |
|
thd |
kados: so I could not see expected problems because MARC-8 data was still MARC-8 inside Koha until I fixed the CVS update path this morning |
01:12 |
|
kados |
i don't understand why utf-8 works fine on my mysql since I haven't changed the tables to handle utf-8 :-) |
01:13 |
|
thd |
kados: it was working fine for Carol Ku except for problems that I had supposed to be related to the previous lack of MARC-8 support |
01:13 |
|
thd |
kados: not knowing Chinese I could not tell |
01:14 |
|
kados |
thd: if you have any MARC records in iso8859 (MARC21) I'd be interested in seeing what happens when they are imported under the new scheme |
01:15 |
|
thd |
kados: those are illegal |
01:15 |
|
thd |
kados: what kind of criminal do you think I am :) |
01:15 |
|
kados |
yea but they exist right? |
01:15 |
|
kados |
hehe |
01:16 |
|
thd |
kados: well I suggested that they did earlier |
01:16 |
|
kados |
anyway ... we digress |
01:16 |
|
thd |
kados: I have seen Z39.50 servers claiming to have records in ISO-8859 |
01:17 |
|
thd |
for MARC 21 |
01:18 |
|
kados |
so the last test I'd like to try tonight |
01:20 |
|
thd |
kados: my Koha tables are UTF-8 but they have bad values |
01:20 |
|
kados |
thd: could you tell me whether the new bulkimport.pl correctly inserts data into those tables? |
01:21 |
|
thd |
kados: They do not even look like UTF-8 values |
01:21 |
|
kados |
(ie, are the bad values from previous imports, or are they from current imports?) |
01:22 |
|
thd |
kados: Do you mean bulkmarcimort.pl that we had fixed by your suggestion? |
01:22 |
|
kados |
yes |
01:22 |
|
thd |
kados: yes I reimpported with the delete option afterwords |
01:23 |
|
kados |
good, so we are all on the same page |
01:23 |
|
kados |
none of us can import utf-8 correctly using utf-8 encoded tables |
01:24 |
|
thd |
kados: My claim about ghost data even after the delete option was applied came from an old Koha MARC export that I had in the import directory but have subsequently deleted so it is no more. |
01:24 |
|
kados |
ahh |
01:24 |
|
kados |
good news |
01:24 |
|
kados |
thd: what's your impression of the progress we've made between 2.2.5 and 2.2.6? |
01:25 |
|
thd |
kados: huge number of bug fixes for show stopping bugs if only MARC-8 worked correctly |
01:25 |
|
kados |
we don't need MARC-8 to work now that we convert everything to UTF-8 right? |
01:26 |
|
thd |
kados: yes except we only need to track down the problem where the original Koha tables must now be getting improperly converted MARC-8 |
01:27 |
|
kados |
thd: they aren't |
01:27 |
|
kados |
thd: it's just that firefox is formatting links strangely |
01:27 |
|
kados |
thd: when there are accented chars in them |
01:27 |
|
thd |
kados: that is a Firefox bug? |
01:27 |
|
kados |
thd: dunno |
01:28 |
|
thd |
kados: I can see the data in the original Koha tables and it is wrong |
01:28 |
|
kados |
er? |
01:28 |
|
kados |
what about the marc tables? |
01:29 |
|
thd |
Kados: marc_subfields_table looks fine |
01:30 |
|
kados |
the koha tables look fine to me: |
01:30 |
|
kados |
mysql> select * from biblio where biblionumber='23783'; |
01:31 |
|
kados |
| 23783 | NULL | Ce�zanne & Poussin : | NULL | NULL | NULL | NULL | 1993 | 20060309192428 | NULL | |
01:31 |
|
kados |
accent is on the e |
01:31 |
|
thd |
kados: then the links are bad from a Firefox bug for you? |
01:31 |
|
kados |
I think it's just a trick of the eyes |
01:32 |
|
kados |
for some reason, the font we're using makes it look like the accents are on the 'z' |
01:32 |
|
thd |
kados: I do not see the accent on #koha |
01:32 |
|
kados |
(yea, this channel is iso-8859) |
01:32 |
|
kados |
the font only makes it look that way when it's a link |
01:32 |
|
thd |
kados: that was my first though but my eyes are better than that or do I need glasses for my perfect vision now? |
01:32 |
|
kados |
it's right in the marc tables and it's right in the koha tables |
01:33 |
|
kados |
and it's right on the normal view for the heading and it's right in the marc view |
01:33 |
|
kados |
it's right everywhere but the links |
01:33 |
|
kados |
and that's a font or browser issue ... let's move on :-) |
01:33 |
|
thd |
kados: code is corrupting the values as in the complaint about a Portuguese letter |
01:34 |
|
kados |
where? |
01:37 |
|
kados |
I don't see any corruption |
01:37 |
|
thd |
not on Google |
01:37 |
|
kados |
hmmm |
01:37 |
|
thd |
kados: I will see if I can produce a link for you from my system |
01:39 |
|
kados |
thd: view source on the opac-detail page |
01:41 |
|
thd |
kados: what do you see in the source? |
01:42 |
|
kados |
properly accented characters |
01:42 |
|
thd |
kados: your locale is utf-8? |
01:43 |
|
kados |
thd: i changed the font on the results screen |
01:44 |
|
kados |
http://opac.liblime.com/cgi-bi[…]ha/opac-search.pl |
01:44 |
|
kados |
do a search on CeÌzanne |
01:44 |
|
kados |
like I said before, it's a font / browser issue |
01:44 |
|
kados |
can we close this topic once and for all? :-) |
01:46 |
|
thd |
kados: I have no results from my search |
01:46 |
|
thd |
kados: now I will try searching with UTF-8 |
01:46 |
|
kados |
that's what I wanted you to search with in the first place but I coudln't paste it in correctly :-) |
01:47 |
|
kados |
thd: are you convinced? |
01:48 |
|
thd |
kados: maybe but was it not working previously with no accents? |
01:49 |
|
kados |
what? |
01:49 |
|
kados |
you mean the search? |
01:49 |
|
thd |
kados: when you searched on Liblime with Cezanne no accents did you not find records or was that just my system |
01:49 |
|
thd |
? |
01:50 |
|
kados |
dunno ... I never tried it |
01:50 |
|
kados |
does it still work on your system? |
01:52 |
|
thd |
kados: that was only before you fixed it |
01:52 |
|
thd |
kados: we have to fix Firefox now :) |
01:52 |
|
kados |
weird |
01:53 |
|
kados |
no we have to fix the Veranda font :-) |
01:53 |
|
kados |
that seemed to be the problem |
01:53 |
|
kados |
soon as i switched to sans-serif it worked fine |
01:53 |
|
thd |
kados: so if I force firefox to display a different font it will be cured? |
01:54 |
|
kados |
seems easier to just change the default font in Koha |
01:54 |
|
kados |
but yes, that should work too |
01:54 |
|
kados |
lets move on |
01:54 |
|
kados |
what's next |
01:57 |
|
thd |
kados: that cured it for LibLime but not my system. I will try my own system again later |
01:58 |
|
thd |
kados: I forced a font change in Firefox itself |
01:59 |
|
thd |
kados: Carol had suspected her fonts and did not understand about UTF-8. I guess she was at least partly right. |
01:59 |
|
thd |
s/UTF/MARC/ |
02:00 |
|
thd |
kados: She could create records fine but could not import them. |
02:01 |
|
thd |
kados: we need a routine for detecting and converting the home user locale on quarry submission |
02:02 |
|
thd |
kados: and we also need query normalisation and index normalisation |
02:02 |
|
thd |
s/quarry/query/ |
02:16 |
|
thd |
kados: If I type C?zanne, I should find something from my ISO-8859 locale with query normalisation. I should also find something if I type Cezanne, even though the authority controlled values will always have C?zanne in UTF-8.. |
02:19 |
|
thd |
kados: Users of western European languages do not have UTF-8 locales on their home systems nor do many of your potential customers on their office systems. |
02:22 |
|
thd |
kados: Almost no Western European language user wants to send UTF-8 email because it will look like junk to most recipients. |
02:23 |
|
kados |
good points |
02:23 |
|
kados |
but I don't think I can fix that tonight :-) |
02:23 |
|
thd |
kados: no not tonight :) |
02:23 |
|
kados |
in fact, I'm troubleshooting getting the z3950 daemon running on one of my servers |
02:24 |
|
kados |
I run it so rarely that I forget if I'm doing it correctly |
02:25 |
|
kados |
can't get it going |
02:25 |
|
kados |
very strange |
02:25 |
|
thd |
kados: I wish mutt or SquirrelMail would allow reading mail in any encoding and sending in UTF-8 but that is an either or choice for the present. |
02:26 |
|
thd |
kados: do you need my Koha Z39.50 server hints message? |
02:26 |
|
kados |
maybe |
02:26 |
|
thd |
s/server/client/ |
02:27 |
|
kados |
is it on kohadocs? |
02:28 |
|
thd |
kados: No I was going to make it into a FAQ but very few users who had trouble had the patience to get the server running |
02:28 |
|
thd |
s/server/client/ |
02:28 |
|
kados |
heh |
02:28 |
|
kados |
well if _I_ can't get it going ... :-) |
02:30 |
|
kados |
sam:/home/nbbc/koha/intranet/scripts/z3950daemon# ./z3950-daemon-launch.sh |
02:30 |
|
kados |
Koha directory is /home/nbbc/koha/intranet/scripts/z3950daemon |
02:30 |
|
kados |
No directory, logging in with HOME=/ |
02:31 |
|
kados |
what am I forgetting? |
02:34 |
|
kados |
thd: got that faq handy? |
02:35 |
|
thd |
kados: you should have it now |
02:39 |
|
kados |
hmmm ... I still cant' get it going |
02:39 |
|
thd |
kados: did you receive my message? |
02:39 |
|
kados |
yep |
02:40 |
|
thd |
kados: It was mostly the starting point for identifying the problem |
02:40 |
|
kados |
yea |
02:41 |
|
thd |
kados: about 5 messages later most users would give up. |
02:41 |
|
thd |
kados: If they had continued then I would actually know hat to put in a proper FAQ |
02:42 |
|
kados |
strangely, I have it running fine on another server |
02:43 |
|
kados |
and this server was working too |
02:43 |
|
thd |
kados: what is different about the 2 servers? |
02:43 |
|
kados |
looks like it just died about a week ago and noone noticed |
02:44 |
|
kados |
interestingly: |
02:44 |
|
kados |
# ./processz3950queue |
02:44 |
|
kados |
Bareword "Net::Z3950::RecordSyntax::USMARC" not allowed while "strict subs" in use at ./processz3950queue line 261. |
02:44 |
|
kados |
Bareword "Net::Z3950::RecordSyntax::UNIMARC" not allowed while "strict subs" in use at ./processz3950queue line 262. |
02:44 |
|
kados |
Execution of ./processz3950queue aborted due to compilation errors. |
02:45 |
|
kados |
ahh ... that's my problem |
02:46 |
|
kados |
Net::Z3950 isn't installed :-) |
02:46 |
|
kados |
just Net::Z3950::ZOOM |
02:46 |
|
kados |
which doesn't yet support the old syntax |
02:46 |
|
thd |
:) |
02:46 |
|
kados |
ok ... so I need to rewrite the z3950 client tomorrow :-) |
02:46 |
|
kados |
thanks for your help thd |
02:47 |
|
thd |
kados: you need documentation to write a non-blocking asynchronous client |
02:47 |
|
thd |
your welcome kados |
02:47 |
|
thd |
good night |
05:26 |
|
hdl |
hi |
05:28 |
|
hdl |
http://www.ratiatum.com/news29[…]ans_le_calme.html |
05:34 |
|
chris |
hmmm, ill have to babelfish that |
05:35 |
|
chris |
ohh DRM stuff |
05:36 |
|
hdl |
yeah. |
05:37 |
|
hdl |
Nationla chamber is voting this without objections nor listening counterparts. |
05:37 |
|
chris |
ohh, that is bad news |
05:39 |
|
hdl |
They retired a law article at the beginning of the vote. Then reintroduced it only to vote against and to vote amendments which promote THEIR vision. AWFUL ! |
05:39 |
|
chris |
from what I can tell, it removes any rights to make a private copy ? |
05:41 |
|
chris |
is it kind of like the DMCA in the US? |
05:41 |
|
chris |
(babelfish doesnt do a very good job of translation, and the french i learnt when i was 12, i have all forgotten :-)) |
06:38 |
|
hdl |
chris : In fact it is not kind of DMCA, it is WORSE :5 |
07:59 |
|
hdl |
pierrick_: hi |
07:59 |
|
hdl |
How are you ? |
08:02 |
|
pierrick_ |
Hi hdl, I'm fine, how do you do ? |
08:03 |
|
hdl |
quite good. |
08:03 |
|
pierrick_ |
I'm writing an email to koha-dev telling what tests I've made about Perl/MySQL/UTF-8 |
08:03 |
|
hdl |
Still on zebra import. |
08:03 |
|
hdl |
Dos it work better ? |
08:03 |
|
hdl |
Are you clear ? |
08:04 |
|
hdl |
(that is : did you get the problem) |
08:04 |
|
pierrick_ |
(it works so nice I don't understand why Paul has spend so much time on this issue) |
08:04 |
|
pierrick_ |
maybe I didn't finally understand what was not working |
08:05 |
|
hdl |
Have you followed DADVSI ? |
08:05 |
|
pierrick_ |
not at all, too much confusing for me |
08:06 |
|
pierrick_ |
and I don't really feel implicated since I only listen to radio... my concern is about free software, and software patents were rejected months ago |
08:08 |
|
hdl |
but they are lurking around. |
08:11 |
|
hdl |
And DADVSI is a matter of culture. French librarian are concerned about it. FSF france is concerned too, since it would make Open-source Software, which CANNOT have DRMs by construction, unless Sun makes his Open-Source DRM a success and a standard, on the fringe. |
08:11 |
|
hdl |
This is why DADVSI matters for me. |
08:12 |
|
pierrick_ |
I understand |
08:17 |
|
hdl |
Anyway, thanks to your soon email, I will be able to work in UTF-8. :) |
08:17 |
|
hdl |
"La vie est belle" ;) |
08:17 |
|
pierrick_ |
maybe I miss the point, we'll see |
08:18 |
|
hdl |
I shall tell you, if it would be the case. |
08:31 |
|
hdl |
can you send a private copy to me... Will be faster : same mail as you, except henridamien |
08:49 |
|
pierrick_ |
sorry, I was not reading IRC, mail was sent |
10:13 |
|
osmoze |
hello |
10:13 |
|
hdl |
pierrick_: Congratulations for set names feature ;) |
10:27 |
|
pierrick_ |
hdl: no no, I didn't found anything more than what Paul found on usenet |
10:27 |
|
pierrick_ |
the only thing I made was to start from scratch with full UTF-8 from the beginning to the end |
10:28 |
|
pierrick_ |
hello osmoze |
10:28 |
|
hdl |
But you did smart testing from scratch. |
10:29 |
|
pierrick_ |
thank you :-) |
10:29 |
|
hdl |
The problem is now to make good use of your results and of your remark on our bases and in Koha. |
10:30 |
|
hdl |
Since at each connection, there is the charset problem. |