Time Nick Message 14:58 thd kados: are you there? 15:00 kados hi thd 15:00 kados thd: what can I do for you? ;-) 15:00 thd hello 15:00 dewey privet, thd 22:55 kados thd: here are some of the errors: 22:55 kados #161 has a problem: Premature end of data in tag subfield line 49 22:55 kados Premature end of data in tag datafield line 48 22:55 kados Premature end of data in tag record line 7 22:55 kados Premature end of data in tag collection line 2 at /usr/local/lib/perl/5.8.4/XML/LibXML/SAX.pm line 64 22:55 kados at /usr/local/share/perl/5.8.4/MARC/File/XML.pm line 450 22:57 thd kados:that is a record parsing error 22:58 thd we need to get those to Ed so he can fix the underlying library 22:59 kados did he make promises to that effect? 22:59 kados not to me anyway ;-) 23:00 thd kados: I had a very good feeling about how he expressed something similar if we had reproducible problems which we had made a reasonable effort to trace 23:00 kados what constitutes reasonable effort? 23:01 thd kados: well testing to be certain that the record itself is not unreasonably misencoded 23:02 kados it doesn't throw an error in marcdump 23:02 kados is that enough testing? :-) 23:03 thd kados: what happens if you use the Koha rel 2_2 bulkmarcimport,pl with logging set to maximum verbosity -vv 23:04 thd kados: that would use MARC record instead of MARC-XML 23:05 kados there is only one perl method to convert to utf8 using the codetables.xml file provided by LOC 23:05 kados that's provided by MARC::File::XML 23:06 thd kados: is that what ARC 23:07 thd kados: is that what MARC::Charset uses? 23:08 kados no, but it uses MARC::Charset ;-) 23:10 thd kados: I think that using the MARC-8 to UTF-8 function which I added to bulkmarcimport.pl may convert the record without triggering SAX errors 23:11 thd kados: If we can convert the encoding first without involving XML we eliminate most sources of error 23:12 kados thd: but your code doesn't utilize the codetables.xml file 23:12 kados thd: that MARC::Charset uses 23:12 kados at least I don't think it does 23:12 thd my code uses MARC::Charset 23:13 kados and I'm also not sure if it properly handles the conversion of multi byte characters, especially handling of 'combining characters' since MARC-8 and UTF- ordering is so different 23:13 thd kados: the Afognak job was filled with errors like what you reported just now until I wrote my own MARC-8 conversion code 23:15 thd kados: I did still have problems but they were reduced from many many to about 3 records in 500 23:16 thd kados: the resulting records all looked fine on my system so I assumed they had worked correctly 23:17 thd kados: they did not look fine on Afognak's system but after testing extensively I concluded that my code was not at fault 23:19 thd kados: I think the problem that MARC::File::XML is having is parsing the records first before converting to UTF-8 23:19 kados it looks like I can just call marc8_to_utf8 now 23:20 kados to convert each subfield 23:20 thd kados: MARC::Record does not care about encoding 23:21 thd kados: so using MARC::Record allows you to open and step through the record without having immediate errors 23:22 kados I will look at your code again 23:23 thd kados: with M::F::X exclusively if you want to open the record without errors you must convert to UTF-8 first but if that is your only tool for UTF-8 conversion then you have a problem 23:29 thd kados: I think importing with M::F::X should be a two stage process ... 23:30 thd first open the record with MARC::Record and convert encoding 23:30 thd save the record with MARC::Record 23:31 thd then use MARC::File::XML to do something after the record is safely converted and now parsable in XML 23:36 thd after skipping the 3 problem records there were 2 records with a character or 2 which may have been invalid MARC-8 and could not be converted to UTF-8 so the whole subfield was deleted with the code I had at the time 23:37 thd so that was about 5 problems in 500 records 23:45 thd kados: does record # 161 pass after using marc8_to_utf8? 23:46 kados thd: I haven't had a chance to write it yet 23:46 kados thd: got distracted ;-) 00:21 thd kados: look at the statement of responsibility for LCCN 79106336 00:21 thd Dating the Icelandic sagas 00:22 thd http://zoomdemo.liblime.com/bib/1972 00:26 thd kados: do you see an Asian language glyph in the statement of responsibility for http://zoomdemo.liblime.com/bib/1972 00:26 thd ? 05:44 hdl hi 10:40 kados paul_lunch: are you around? 10:41 paul it's almost 3PM in France. fortunatly, lunch over ;-) 10:41 kados heh 10:41 paul 'morning joshua 10:41 kados hi ... 10:41 kados http://zoomdemo.liblime.com/search?q=test 10:41 kados click on a detail page 10:41 paul done. 10:42 kados it's built in about 5 lines of perl 10:42 paul and I bet 10000 that it's marc21 specific ;-) 10:42 kados my $xmlrecord = C4::Biblio::getRecord("biblioserver","Local-number=$biblionumber"); 10:42 kados my $xslfile = "/home/kohacat/etc/xslt/MARC21slim2English.xsl"; 10:42 kados my $parser = XML::LibXML->new(); 10:42 kados my $xslt = XML::LibXSLT->new(); 10:42 kados my $source = $parser->parse_string($xmlrecord); 10:42 kados my $style_doc = $parser->parse_file($xslfile); 10:42 kados my $stylesheet = $xslt->parse_stylesheet($style_doc); 10:42 kados my $results = $stylesheet->transform($source); 10:42 kados my $newxmlrecord = $stylesheet->output_string($results); 10:42 kados more than 5 ... 10:43 kados but it's the first time I've attempted to use XSLT to format a MARCXML record 10:44 paul it's really great & powerful. what does the xslt look like ? 10:44 kados http://www.loc.gov/standards/marcxml/xslt/MARC21slim2English.xsl 10:45 kados I have tested several of the xslt files on this page: http://www.loc.gov/standards/marcxml/ 10:45 kados they can all be processed as above 10:45 paul loc.gov don't answer on this side of the ocean... 10:45 paul (or very very very slowly...) 10:45 kados weird 10:45 paul answer in 50seconds 10:46 kados zoomdemo.liblime.com/MARC21slim2English.xsl 10:47 paul which perl packages does it requires ? 10:47 paul how fast/slow is it ? (if you tested for speed) 10:48 kados I haven't tested speed ... it requires XML::LibXML and XML::LibXSLT 10:49 kados I decided to just play with it to see if it's worth pursuing 10:49 kados and it turns out the code to do the transformation is simpler than I thought 10:49 kados owen: time to brush up on your XSLT ;-) 10:49 owen I think to brush up I'd need to have some to brush. 10:51 kados hdl mentioned XSLT at our kohacon 10:53 hdl kados paul : the only pb is to build good and thorough xsl files from frameworks.... which could be accomplished with a good xml framework description. 10:53 kados hi hdl 10:54 kados hdl: do you think we could replace a framework with XSLT?: 10:54 hdl yes. 10:54 hdl But not at the moment. 10:54 hdl Would take some time. 10:55 hdl 1st step would be to define a good DTD for frameworks. 10:55 hdl That would be a base for input, Output and sumaries. 10:56 hdl But then parsing xml frameworks to produce xlst would be nice. 10:56 hdl And xslt would parse xml records to produce correct HTML. 10:57 hdl But need is to be quite precise in Framework description 10:57 kados didn't we have such a definition for opencataloger? 10:57 kados I thought toins created it 10:58 paul toins created something close from our actual frameworks, you're right