Time  Nick  Message
14:58 thd   kados: are you there?
15:00 kados hi thd
15:00 kados thd: what can I do for you? ;-)
15:00 thd   hello
15:00 dewey privet, thd
22:55 kados thd: here are some of the errors:
22:55 kados #161 has a problem: Premature end of data in tag subfield line 49
22:55 kados Premature end of data in tag datafield line 48
22:55 kados Premature end of data in tag record line 7
22:55 kados Premature end of data in tag collection line 2 at /usr/local/lib/perl/5.8.4/XML/LibXML/SAX.pm line 64
22:55 kados  at /usr/local/share/perl/5.8.4/MARC/File/XML.pm line 450
22:57 thd   kados:that is a record parsing error
22:58 thd   we need to get those to Ed so he can fix the underlying library
22:59 kados did he make promises to that effect?
22:59 kados not to me anyway ;-)
23:00 thd   kados: I had a very good feeling about how he expressed something similar if we had reproducible problems which we had made a reasonable effort to trace
23:00 kados what constitutes reasonable effort?
23:01 thd   kados: well testing to be certain that the record itself is not unreasonably misencoded
23:02 kados it doesn't throw an error in marcdump
23:02 kados is that enough testing? :-)
23:03 thd   kados: what happens if you use the Koha rel 2_2 bulkmarcimport,pl with logging set to maximum verbosity -vv
23:04 thd   kados: that would use MARC record instead of MARC-XML
23:05 kados there is only one perl method to convert to utf8 using the codetables.xml file provided by LOC
23:05 kados that's provided by MARC::File::XML
23:06 thd   kados: is that what ARC
23:07 thd   kados: is that what MARC::Charset uses?
23:08 kados no, but it uses MARC::Charset ;-)
23:10 thd   kados: I think that using the MARC-8 to UTF-8 function which  I added to bulkmarcimport.pl  may convert the record without triggering SAX errors
23:11 thd   kados: If we can convert the encoding first without involving XML we eliminate most sources of error
23:12 kados thd: but your code doesn't utilize the codetables.xml file
23:12 kados thd: that MARC::Charset uses
23:12 kados at least I don't think it does
23:12 thd   my code uses MARC::Charset
23:13 kados and I'm also not sure if it properly handles the conversion of multi byte characters, especially handling of 'combining characters' since MARC-8 and UTF- ordering is so different
23:13 thd   kados: the Afognak job was filled with errors like what you reported just now until I wrote my own MARC-8 conversion code
23:15 thd   kados: I did still have problems but they were reduced from many many to about 3 records in 500
23:16 thd   kados: the resulting records all looked fine on my system so I assumed they had worked correctly
23:17 thd   kados: they did not look fine on Afognak's system but after testing extensively I concluded that my code was not at fault
23:19 thd   kados: I think the problem that MARC::File::XML is having is parsing the records first before converting to UTF-8
23:19 kados it looks like I can just call marc8_to_utf8 now
23:20 kados to convert each subfield
23:20 thd   kados: MARC::Record does not care about encoding
23:21 thd   kados: so using MARC::Record allows you to open and step through the record without having immediate errors
23:22 kados I will look at your code again
23:23 thd   kados: with M::F::X exclusively if you want to open the record without errors you must convert to UTF-8 first but if that is your only tool for UTF-8 conversion then you have a problem
23:29 thd   kados: I think importing with M::F::X should be  a two stage process ...
23:30 thd   first open the record with MARC::Record and convert encoding
23:30 thd   save the record with MARC::Record
23:31 thd   then use MARC::File::XML to do something after the record is safely converted and now parsable in XML
23:36 thd   after skipping the 3 problem records there were 2 records with a character or 2 which may have been invalid MARC-8 and could not be converted to UTF-8 so the whole subfield was deleted with the code I had at the time
23:37 thd   so that was about 5 problems in 500 records
23:45 thd   kados: does record # 161 pass after using marc8_to_utf8?
23:46 kados thd: I haven't had a chance to write it yet
23:46 kados thd: got distracted ;-)
00:21 thd   kados: look at the statement of responsibility for LCCN 79106336
00:21 thd   Dating the Icelandic sagas
00:22 thd   http://zoomdemo.liblime.com/bib/1972
00:26 thd   kados: do you see an Asian language glyph in the statement of responsibility for http://zoomdemo.liblime.com/bib/1972
00:26 thd   ?
05:44 hdl   hi
10:40 kados paul_lunch: are you around?
10:41 paul  it's almost 3PM in France. fortunatly, lunch over ;-)
10:41 kados heh
10:41 paul  'morning joshua
10:41 kados hi ...
10:41 kados http://zoomdemo.liblime.com/search?q=test
10:41 kados click on a detail page
10:41 paul  done.
10:42 kados it's built in about 5 lines of perl
10:42 paul  and I bet 10000 that it's marc21 specific ;-)
10:42 kados my $xmlrecord = C4::Biblio::getRecord("biblioserver","Local-number=$biblionumber");
10:42 kados my $xslfile = "/home/kohacat/etc/xslt/MARC21slim2English.xsl";
10:42 kados my $parser = XML::LibXML->new();
10:42 kados my $xslt = XML::LibXSLT->new();
10:42 kados my $source = $parser->parse_string($xmlrecord);
10:42 kados my $style_doc = $parser->parse_file($xslfile);
10:42 kados my $stylesheet = $xslt->parse_stylesheet($style_doc);
10:42 kados my $results = $stylesheet->transform($source);
10:42 kados my $newxmlrecord = $stylesheet->output_string($results);
10:42 kados more than 5 ...
10:43 kados but it's the first time I've attempted to use XSLT to format a MARCXML record
10:44 paul  it's really great & powerful. what does the xslt look like ?
10:44 kados http://www.loc.gov/standards/marcxml/xslt/MARC21slim2English.xsl
10:45 kados I have tested several of the xslt files on this page: http://www.loc.gov/standards/marcxml/
10:45 kados they can all be processed as above
10:45 paul  loc.gov don't answer on this side of the ocean...
10:45 paul  (or very very very slowly...)
10:45 kados weird
10:45 paul  answer in 50seconds
10:46 kados zoomdemo.liblime.com/MARC21slim2English.xsl
10:47 paul  which perl packages does it requires ?
10:47 paul  how fast/slow is it ? (if you tested for speed)
10:48 kados I haven't tested speed ... it requires XML::LibXML and XML::LibXSLT
10:49 kados I decided to just play with it to see if it's worth pursuing
10:49 kados and it turns out the code to do the transformation is simpler than I thought
10:49 kados owen: time to brush up on your XSLT ;-)
10:49 owen  I think to brush up I'd need to have some to brush.
10:51 kados hdl mentioned XSLT at our kohacon
10:53 hdl   kados paul : the only pb is to build good and thorough xsl files from frameworks.... which could be accomplished with a good xml framework description.
10:53 kados hi hdl
10:54 kados hdl: do you think we could replace a framework with XSLT?:
10:54 hdl   yes.
10:54 hdl   But not at the moment.
10:54 hdl   Would take some time.
10:55 hdl   1st step would be to define a good DTD for frameworks.
10:55 hdl   That would be a base for input, Output and sumaries.
10:56 hdl   But then parsing xml frameworks to produce xlst would be nice.
10:56 hdl   And xslt would parse xml records to produce correct HTML.
10:57 hdl   But need is to be quite precise in Framework description
10:57 kados didn't we have such a definition for opencataloger?
10:57 kados I thought toins created it
10:58 paul  toins created something close from our actual frameworks, you're right