4 Commits
0.3.1 ... 0.4.0

Author SHA1 Message Date
zynode
70366e3085 parseCSV 0.4 beta
- Error reporting for files/data which is corrupt
  or has formatting errors like using double
  quotes in a field without enclosing quotes. Or
  not escaping double quotes with a second one.

- parse() method does not require input anymore
  if the "$object->file" property has been set.

I'm calling this a beta release due to the heavy
modifications to the core parsing logic required
for error reporting to work. I have tested the
new code quite extensively, I'm fairly confident
that it still parses exactly as it always has.

The second reason I'm calling it a beta release
is cause I'm sure the error reporting code will
need more refinements and tweaks to detect more
types of errors, as it's only picking two types
or syntax errors right now. However, it seems
these two are the most common errors that you
would be likely to come across.

git-svn-id: http://parsecsv-for-php.googlecode.com/svn/trunk@28 339761fc-0c37-0410-822d-8b8cac1f6a97
2008-04-11 18:13:37 +00:00
zynode
2dfd35b988 added error reporting/validation of parsed data, still needs some more testing before release tho...
git-svn-id: http://parsecsv-for-php.googlecode.com/svn/trunk@27 339761fc-0c37-0410-822d-8b8cac1f6a97
2008-04-07 16:17:44 +00:00
zynode
ae00f949f0 parseCSV 0.3.2
This is primarily a bug-fix release for a critical
bug which was brought to my attention.

- Fixed a critical bug in conditions parsing which
  would generate corrupt matching patterns causing
  the condition(s) to not work at all in some
  situations.

- Fixed a small code error which would cause PHP to
  generate a invalid offset notice when zero length
  values were fed into the unparse() method to
  generate CSV data from an array.

git-svn-id: http://parsecsv-for-php.googlecode.com/svn/trunk@22 339761fc-0c37-0410-822d-8b8cac1f6a97
2008-03-31 22:50:36 +00:00
zynode
4e76da5eff minor fix to a bug which caused notice errors to be generated when _enclose_value() was fed a zero character long string
git-svn-id: http://parsecsv-for-php.googlecode.com/svn/trunk@20 339761fc-0c37-0410-822d-8b8cac1f6a97
2008-03-31 20:55:33 +00:00
4 changed files with 172 additions and 48 deletions

View File

@@ -1,3 +1,58 @@
parseCSV 0.4 beta
-----------------------------------
Date: 11-Apr-2008
- Error reporting for files/data which is corrupt
or has formatting errors like using double
quotes in a field without enclosing quotes. Or
not escaping double quotes with a second one.
- parse() method does not require input anymore
if the "$object->file" property has been set.
I'm calling this a beta release due to the heavy
modifications to the core parsing logic required
for error reporting to work. I have tested the
new code quite extensively, I'm fairly confident
that it still parses exactly as it always has.
The second reason I'm calling it a beta release
is cause I'm sure the error reporting code will
need more refinements and tweaks to detect more
types of errors, as it's only picking two types
or syntax errors right now. However, it seems
these two are the most common errors that you
would be likely to come across.
-----------------------------------
parseCSV 0.3.2
-----------------------------------
Date: 1-Apr-2008
This is primarily a bug-fix release for a critical
bug which was brought to my attention.
- Fixed a critical bug in conditions parsing which
would generate corrupt matching patterns causing
the condition(s) to not work at all in some
situations.
- Fixed a small code error which would cause PHP to
generate a invalid offset notice when zero length
values were fed into the unparse() method to
generate CSV data from an array.
Notice: If you have been using the "parsecsv-stable"
branch as an external in any of your projects,
please use the "stable/parsecsv" branch from this
point on as I will eventually remove the former due
to it's stupid naming.
-----------------------------------
parseCSV 0.3.1
-----------------------------------
Date: 1-Sep-2007
@@ -31,6 +86,9 @@ Date: 9-Aug-2007
- Minor changes and optimizations, and a few
spelling corrections. Oops :)
- Included more complex code examples in the
parseCSV download.
-----------------------------------

View File

@@ -1,15 +1,15 @@
rating,title,author,type,asin,tags,review
0,The Killing Kind,John Connolly,Book,0340771224,,i still haven't had time to read this one...
0,The Third Secret,Steve Berry,Book,0340899263,,need to find time to read this book
3,The Last Templar,Raymond Khoury,Book,0752880705,,
5,The Traveller,John Twelve Hawks,Book,059305430X,,
4,Crisis Four,Andy Mcnab,Book,0345428080,,
5,Prey,Michael Crichton,Book,0007154534,,
3,The Broker (Paperback),John Grisham,Book,0440241588,book johngrisham,"good book, but is slow in the middle"
3,Without Blood (Paperback),Alessandro Baricco,Book,1841955744,,
5,State of Fear (Paperback),Michael Crichton,Book,0061015733,,
4,The Rule of Four (Paperback),Ian Caldwell,Book,0099451956,book bestseller,
4,Deception Point (Paperback),Dan Brown,Book,0671027387,book danbrown bestseller,
5,Digital Fortress : A Thriller (Mass Market Paperback),Dan Brown,Book,0312995423,book danbrown bestseller,
5,Angels & Demons (Mass Market Paperback),Dan Brown,Book,0671027360,book danbrown bestseller,
rating,title,author,type,asin,tags,review
0,The Killing Kind,John Connolly,Book,0340771224,,i still haven't had time to read this one...
0,The Third Secret,Steve Berry,Book,0340899263,,need to find time to read this book
3,The Last Templar,Raymond Khoury,Book,0752880705,,
5,The Traveller,John Twelve Hawks,Book,059305430X,,
4,Crisis Four,Andy Mcnab,Book,0345428080,,
5,Prey,Michael Crichton,Book,0007154534,,
3,The Broker (Paperback),John Grisham,Book,0440241588,book johngrisham,"good book, but is slow in the middle"
3,Without Blood (Paperback),Alessandro Baricco,Book,1841955744,,
5,State of Fear (Paperback),Michael Crichton,Book,0061015733,,
4,The Rule of Four (Paperback),Ian Caldwell,Book,0099451956,book bestseller,
4,Deception Point (Paperback),Dan Brown,Book,0671027387,book danbrown bestseller,
5,Digital Fortress : A Thriller (Mass Market Paperback),Dan Brown,Book,0312995423,book danbrown bestseller,
5,Angels & Demons (Mass Market Paperback),Dan Brown,Book,0671027360,book danbrown bestseller,
4,The Da Vinci Code (Hardcover),Dan Brown," Book ",0385504209,book movie danbrown bestseller davinci,
1 rating title author type asin tags review
2 0 The Killing Kind John Connolly Book 0340771224 i still haven't had time to read this one...
3 0 The Third Secret Steve Berry Book 0340899263 need to find time to read this book
4 3 The Last Templar Raymond Khoury Book 0752880705
5 5 The Traveller John Twelve Hawks Book 059305430X
6 4 Crisis Four Andy Mcnab Book 0345428080
7 5 Prey Michael Crichton Book 0007154534
8 3 The Broker (Paperback) John Grisham Book 0440241588 book johngrisham good book, but is slow in the middle
9 3 Without Blood (Paperback) Alessandro Baricco Book 1841955744
10 5 State of Fear (Paperback) Michael Crichton Book 0061015733
11 4 The Rule of Four (Paperback) Ian Caldwell Book 0099451956 book bestseller
12 4 Deception Point (Paperback) Dan Brown Book 0671027387 book danbrown bestseller
13 5 Digital Fortress : A Thriller (Mass Market Paperback) Dan Brown Book 0312995423 book danbrown bestseller
14 5 Angels & Demons (Mass Market Paperback) Dan Brown Book 0671027360 book danbrown bestseller
15 4 The Da Vinci Code (Hardcover) Dan Brown Book 0385504209 book movie danbrown bestseller davinci

View File

@@ -13,7 +13,11 @@ $csv = new parseCSV();
# Parse '_books.csv' using automatic delimiter detection...
$csv->auto('_books.csv');
# ...or if you know the delimiter, use the parse() function.
# ...or if you know the delimiter, set the delimiter character
# if its not the default comma...
// $csv->delimiter = "\t"; # tab delimited
# ...and then use the parse() function.
// $csv->parse('_books.csv');

View File

@@ -4,7 +4,7 @@ class parseCSV {
/*
Class: parseCSV v0.3.1
Class: parseCSV v0.4 beta
http://code.google.com/p/parsecsv-for-php/
@@ -140,6 +140,19 @@ class parseCSV {
# loaded file contents
var $file_data;
# error while parsing input data
# 0 = No errors found. Everything should be fine :)
# 1 = Hopefully correctable syntax error was found.
# 2 = Enclosure character (double quote by default)
# was found in non-enclosed field. This means
# the file is either corrupt, or does not
# standard CSV formatting. Please validate
# the parsed data yourself.
var $error = 0;
# detailed error info
var $error_info = array();
# array of field values in data parsed
var $titles = array();
@@ -170,6 +183,7 @@ class parseCSV {
* @return nothing
*/
function parse ($input = null, $offset = null, $limit = null, $conditions = null) {
if ( $input === null ) $input = $this->file;
if ( !empty($input) ) {
if ( $offset !== null ) $this->offset = $offset;
if ( $limit !== null ) $this->limit = $limit;
@@ -272,13 +286,13 @@ class parseCSV {
$pch = ( isset($data{$i-1}) ) ? $data{$i-1} : false ;
// open and closing quotes
if ( $ch == $enclosure && (!$enclosed || $nch != $enclosure) ) {
$enclosed = ( $enclosed ) ? false : true ;
// inline quotes
} elseif ( $ch == $enclosure && $enclosed ) {
$i++;
if ( $ch == $enclosure ) {
if ( !$enclosed || $nch != $enclosure ) {
$enclosed = ( $enclosed ) ? false : true ;
} elseif ( $enclosed ) {
$i++;
}
// end of row
} elseif ( ($ch == "\n" && $pch != "\r" || $ch == "\r") && !$enclosed ) {
if ( $n >= $search_depth ) {
@@ -311,13 +325,12 @@ class parseCSV {
// capture most probable delimiter
ksort($filtered);
$delimiter = reset($filtered);
$this->delimiter = $delimiter;
$this->delimiter = reset($filtered);
// parse data
if ( $parse ) $this->data = $this->parse_string();
return $delimiter;
return $this->delimiter;
}
@@ -349,6 +362,8 @@ class parseCSV {
} else return false;
}
$white_spaces = str_replace($this->delimiter, '', " \t\x0B\0");
$rows = array();
$row = array();
$row_count = 0;
@@ -365,24 +380,68 @@ class parseCSV {
$nch = ( isset($data{$i+1}) ) ? $data{$i+1} : false ;
$pch = ( isset($data{$i-1}) ) ? $data{$i-1} : false ;
// open and closing quotes
if ( $ch == $this->enclosure && (!$enclosed || $nch != $this->enclosure) ) {
$enclosed = ( $enclosed ) ? false : true ;
if ( $enclosed ) $was_enclosed = true;
// inline quotes
} elseif ( $ch == $this->enclosure && $enclosed ) {
$current .= $ch;
$i++;
// open/close quotes, and inline quotes
if ( $ch == $this->enclosure ) {
if ( !$enclosed ) {
if ( ltrim($current, $white_spaces) == '' ) {
$enclosed = true;
$was_enclosed = true;
} else {
$this->error = 2;
$error_row = count($rows) + 1;
$error_col = $col + 1;
if ( !isset($this->error_info[$error_row.'-'.$error_col]) ) {
$this->error_info[$error_row.'-'.$error_col] = array(
'type' => 2,
'info' => 'Syntax error found on row '.$error_row.'. Non-enclosed fields can not contain double-quotes.',
'row' => $error_row,
'field' => $error_col,
'field_name' => (!empty($head[$col])) ? $head[$col] : null,
);
}
$current .= $ch;
}
} elseif ($nch == $this->enclosure) {
$current .= $ch;
$i++;
} elseif ( $nch != $this->delimiter && $nch != "\r" && $nch != "\n" ) {
for ( $x=($i+1); isset($data{$x}) && ltrim($data{$x}, $white_spaces) == ''; $x++ ) {}
if ( $data{$x} == $this->delimiter ) {
$enclosed = false;
$i = $x;
} else {
if ( $this->error < 1 ) {
$this->error = 1;
}
$error_row = count($rows) + 1;
$error_col = $col + 1;
if ( !isset($this->error_info[$error_row.'-'.$error_col]) ) {
$this->error_info[$error_row.'-'.$error_col] = array(
'type' => 1,
'info' =>
'Syntax error found on row '.(count($rows) + 1).'. '.
'A single double-quote was found within an enclosed string. '.
'Enclosed double-quotes must be escaped with a second double-quote.',
'row' => count($rows) + 1,
'field' => $col + 1,
'field_name' => (!empty($head[$col])) ? $head[$col] : null,
);
}
$current .= $ch;
$enclosed = false;
}
} else {
$enclosed = false;
}
// end of field/row
} elseif ( ($ch == $this->delimiter || ($ch == "\n" && $pch != "\r") || $ch == "\r") && !$enclosed ) {
if ( !$was_enclosed ) $current = trim($current);
} elseif ( ($ch == $this->delimiter || $ch == "\n" || $ch == "\r") && !$enclosed ) {
$key = ( !empty($head[$col]) ) ? $head[$col] : $col ;
$row[$key] = $current;
$row[$key] = ( $was_enclosed ) ? $current : trim($current) ;
$current = '';
$was_enclosed = false;
$col++;
// end of row
if ( $ch == "\n" || $ch == "\r" ) {
if ( $this->_validate_offset($row_count) && $this->_validate_row_conditions($row, $this->conditions) ) {
@@ -405,6 +464,7 @@ class parseCSV {
if ( $this->sort_by === null && $this->limit !== null && count($rows) == $this->limit ) {
$i = $strlen;
}
if ( $ch == "\r" && $nch == "\n" ) $i++;
}
// append character to current field
@@ -503,11 +563,11 @@ class parseCSV {
function _validate_row_conditions ($row = array(), $conditions = null) {
if ( !empty($row) ) {
if ( !empty($conditions) ) {
$conditions = (strpos($conditions, 'OR') !== false) ? explode('OR', $conditions) : array($conditions) ;
$conditions = (strpos($conditions, ' OR ') !== false) ? explode(' OR ', $conditions) : array($conditions) ;
$or = '';
foreach( $conditions as $key => $value ) {
if ( strpos($value, 'AND') !== false ) {
$value = explode('AND', $value);
if ( strpos($value, ' AND ') !== false ) {
$value = explode(' AND ', $value);
$and = '';
foreach( $value as $k => $v ) {
$and .= $this->_validate_row_condition($row, $v);
@@ -601,11 +661,13 @@ class parseCSV {
* @return Processed value
*/
function _enclose_value ($value = null) {
$delimiter = preg_quote($this->delimiter, '/');
$enclosure = preg_quote($this->enclosure, '/');
if ( preg_match("/".$delimiter."|".$enclosure."|\n|\r/i", $value) || ($value{0} == ' ' || substr($value, -1) == ' ') ) {
$value = str_replace($this->enclosure, $this->enclosure.$this->enclosure, $value);
$value = $this->enclosure.$value.$this->enclosure;
if ( $value !== null && $value != '' ) {
$delimiter = preg_quote($this->delimiter, '/');
$enclosure = preg_quote($this->enclosure, '/');
if ( preg_match("/".$delimiter."|".$enclosure."|\n|\r/i", $value) || ($value{0} == ' ' || substr($value, -1) == ' ') ) {
$value = str_replace($this->enclosure, $this->enclosure.$this->enclosure, $value);
$value = $this->enclosure.$value.$this->enclosure;
}
}
return $value;
}