Bug in regular expression used for processing CSV and tab-separated data
It appears that the regular expression used in lib/text/table.js doesn't distinguish between tab-separated and CSV data files correctly.
My problem was that one of the tab-separated fields had commas in it. I was not putting quotes around the data.
I suspect (but haven't tested this) that a CVS file may have problems with embedded tabs too.
I could work out how to put quotes around the fields (I'm extracting from a database and its not easy) but a better solution would be to allow an extra arg to the property setting to indicate whether the input file is CSV or tab-separated and then have the regular expression process the data rows accordingly.
I put double quotes around the last data field hoping it would solve the problem but it didn't.
First surprise was that the quotes were displayed. Second surprise was that some of the text also had embedded tabs. These tabs were treated as separators so the rest of the text wasn't displayed.
Oh, well. I'll try making it all CSV and see if that works any better.
and left the quotes around the last field that had embedded tabs. The
tabs were still causing a problem due to the bug mentioned in my first
post in this thread.
I also worked out why the quotes were being displayed. The output file
has trailing blanks. This is caused by the program used to extract data
from the database and write it to a file. The trailing blanks are after the
closing quote encapsulating the last field. Removing those stopped the
quotes from showing.
Due to these problems, I though of writing my own function. Use
another separator like a pipe '|' or something and parse for that only.
Or pass the type of separator used in the args to the function call.
But, in the end, due to time limits, I opted for a simpler solution. Since
I'm using this on a Unix server, I added some code to postprocess the
file via a call to sed - it just strips out redundant spaces at the
end (there's one at the start too - it wasn't causing a problem but I
thought I'd get rid of it anyway) and replaces tabs with spaces.
Here's the code -
sed -e 's/^ //' -e 's/[ ]*\$//' -e 's/ / /' input_file > output_file
The 3 sed edit directives are substitute a leading space with nothing,
substitute a series of trailing spaces with nothing and substitute tabs
That cleans up the output enough that it looks like an acceptable CSV
file to the AW function.
Now, the last issue I have is that the last field is variable length text.
Some of it can get very long. I have to figure out how to display it
wrapped around in a larger row cell. From what I've read I'll need to
embed breaks '<br>' in the string and change the row height.
Not sure how I'll embed breaks but I'll look into it and post a followup
Unfortunately, I'm stuck with perl for the CGI and calls to a Unix shell (to run commands).
I'd like to switch to php but XML depends on the RDBMS vendor we're using. That won't happen overnight.
Still, the real issue is that there's a bug in the reg-ex code AW uses. That needs to be looked at. The simplest solution is to break it up by the type of data (I though of determining that by the filename extension) but using an additional arg that lets you nominate the separator then lets you use anything you want.
Can i use semicolon as delimeter?
is it possible to define separator by some property?
This topic is archived.
Back to support forum
Version 2Looking for AW 2.6 ?
The old site is here.