:: Forum >> Version 1 >>

Bug in regular expression used for processing CSV and tab-separated data

More information on this topic is available in the documentation section: /active.howto.load.csv/.

I am loading tab-separated data into a grid from a file. I was taken by surprise where not all the data was displayed.

It appears that the regular expression used in lib/text/table.js doesn't distinguish between tab-separated and CSV data files correctly.

My problem was that one of the tab-separated fields had commas in it. I was not putting quotes around the data.

I suspect (but haven't tested this) that a CVS file may have problems with embedded tabs too.

I could work out how to put quotes around the fields (I'm extracting from a database and its not easy) but a better solution would be to allow an extra arg to the property setting to indicate whether the input file is CSV or tab-separated and then have the regular expression process the data rows accordingly.
Anthony M.
Thursday, August 4, 2005
Well, it looks like it worse than I though.

I put double quotes around the last data field hoping it would solve the problem but it didn't.

First surprise was that the quotes were displayed. Second surprise was that some of the text also had embedded tabs. These tabs were treated as separators so the rest of the text wasn't displayed.

Oh, well. I'll try making it all CSV and see if that works any better.
Anthony M.
Monday, August 8, 2005
Well, that didn't work either. I replaced the tab separators with commas
and left the quotes around the last field that had embedded tabs. The
tabs were still causing a problem due to the bug mentioned in my first
post in this thread.

I also worked out why the quotes were being displayed. The output file
has trailing blanks. This is caused by the program used to extract data
from the database and write it to a file. The trailing blanks are after the
closing quote encapsulating the last field. Removing those stopped the
quotes from showing.

Due to these problems, I though of writing my own function. Use
another separator like a pipe '|' or something and parse for that only.
Or pass the type of separator used in the args to the function call.

But, in the end, due to time limits, I opted for a simpler solution. Since
I'm using this on a Unix server, I added some code to postprocess the
file via a call to sed - it just strips out redundant spaces at the
end (there's one at the start too - it wasn't causing a problem but I
thought I'd get rid of it anyway) and replaces tabs with spaces.

Here's the code -

sed -'s/^ //' -'s/[   ]*\$//' -'s/  / /' input_file output_file
 
The 3 sed edit directives are substitute a leading space with nothing,
substitute a series of trailing spaces with nothing and substitute tabs
with spaces.

That cleans up the output enough that it looks like an acceptable CSV
file to the AW function.

Now, the last issue I have is that the last field is variable length text.
Some of it can get very long. I have to figure out how to display it
wrapped around in a larger row cell. From what I've read I'll need to
embed breaks '<br>' in the string and change the row height.

Not sure how I'll embed breaks but I'll look into it and post a followup
solution.
Anthony M.
Wednesday, August 10, 2005
I skipped all that and made a VBA script in Excel and made XML. Loading in IE now is what slow (parsing XML is slow) firefox much faster. But it is easily now to load the grid with the data statement (the data is XML escaped so € is becoming &#128; etc).


John Ophof http://open-modeling.sourceforge.net
Wednesday, August 10, 2005
Hi John,

Unfortunately, I'm stuck with perl for the CGI and calls to a Unix shell (to run commands).

I'd like to switch to php but XML depends on the RDBMS vendor we're using. That won't happen overnight.

Still, the real issue is that there's a bug in the reg-ex code AW uses. That needs to be looked at. The simplest solution is to break it up by the type of data (I though of determining that by the filename extension) but using an additional arg that lets you nominate the separator then lets you use anything you want.
Anthony M.
Friday, August 12, 2005
If you are using perl why not just output a javascript array in the page?
Friday, August 12, 2005
Hi All,

Can i use semicolon as delimeter?
is it possible to define separator by some property?

regards

raju
Raju Dave
Tuesday, December 13, 2005



This topic is archived.

Back to /active.howto.load.csv/

Documentation:

Forum search