# CSV Spec CSV is not a file format, it is typically a loose set of guidelines of how to structure tabular data into a plain text string. As such there's an endless amount of `*.csv` files floating around which are highly incompatible with each other. The closest thing there is to a specification is [RFC 4180][]. [rfc 4180]: http://tools.ietf.org/html/rfc4180 ## Goals This project is an attempt to summarize RFC 4180 and the information in the [Comma-separated values (CSV)][csv] Wikipedia article into a easy to understand format. The spec will also take into account that the comma (`,`) character is not the only character used as a field delimiter. Semi-colons (`;`), tabs (`\t`), and more are popular field delimiter characters. As such the specification will more accurately be describing a CSV-like structured data format. [csv]: http://en.wikipedia.org/wiki/Comma-separated_values We will also provide input/output test files that CSV parser/writer software libraries can use to validate if they properly adhere to the rules laid out in this specification. And if possible we will even try to provide code snippets in various languages that attempts to automatically determine the delimiter character used in any given input CSV-like formatted file/data. ## Roadmap 1. Write up core specification rules. _[in-progress]_ 2. Create input/output test files covering all rules in the specification. 3. Create website for [csv-spec.org](http://csv-spec.org/). 4. Create linting tool as a NPM module, allowing easy validation of CSV data both client-side in a web browser, and server side via a command line tool. 5. Create automatic delimiter character detection code snippets in various programming languages which CSV parser developers can freely use to enhance their libraries. ## Terminology - **Field** — A singular String value within a row. - **Record** (or **Row**) — A collection of fields. - **Column** — Fields from multiple rows at the same offset. For example the second column would be a list of the second field from every row. - **Delimiter** — The character used to separate fields withing a row. Commonly this will be a comma (`,`), but semi-colons (`;`) or tabs (`\t`) are two other popular delimiter characters. - **Header** — The first row is often used to contain the column names for all remaining rows. Header names would be used as key names when CSV data is converted to JSON for example. - **Line Break** — Line breaks in CSV files should be CRLF (`\r\n`). ## CSV Format Definition - These rules are mostly based on the corresponding section from [RFC 4180][def], with minor changes, clarifications and improved examples. - Where relevant, examples include both the CSV text version and the equivalent data in JSON format. - Line breaks in the CSV examples are displayed as `¬`. [def]: http://tools.ietf.org/html/rfc4180#section-2 ### Rules 1. Each **record** is located on a separate line, each line ending with a **line break** (CRLF or `\r\n`). For example: CSV: ```csv aaa,bbb,ccc¬ xxx,yyy,zzz¬ ``` JSON: ```json [ ["aaa", "bbb", "ccc"], ["xxx", "yyy", "zzz"] ] ``` 2. Though recommended, the last record in a file is not required to have a ending **line break**. For example: CSV: ```csv aaa,bbb,ccc¬ xxx,yyy,zzz ``` JSON: ```json [ ["aaa", "bbb", "ccc"], ["xxx", "yyy", "zzz"] ] ``` 3. There maybe an optional **header** line appearing as the first line of the file with the same format as normal **record** lines. This **header** will contain names corresponding to the **fields** in the file and should contain the same number of **fields** as the **records** in the rest of the file. For example: ```csv field_1,field_2,field_3¬ aaa,bbb,ccc¬ xxx,yyy,zzz¬ ``` JSON (ignoring headers): ```json [ ["field_1", "field_2", "field_3"], ["aaa", "bbb", "ccc"], ["xxx", "yyy", "zzz"] ] ``` JSON (using headers): ```json [ {"field_1": "aaa", "field_2": "bbb", "field_3": "ccc"}, {"field_1": "xxx", "field_2": "yyy", "field_3": "zzz"} ] ``` ## License [CC0 1.0 Universal](http://creativecommons.org/publicdomain/zero/1.0/)