Flowdock

This class provides a complete interface to CSV files and data. It offers tools to enable you to read and write to and from Strings or IO objects, as needed.

Reading

From a File

A Line at a Time

CSV.foreach("path/to/file.csv") do |row|
  # use row here...
end

All at Once

arr_of_arrs = CSV.read("path/to/file.csv")

From a String

A Line at a Time

CSV.parse("CSV,data,String") do |row|
  # use row here...
end

All at Once

arr_of_arrs = CSV.parse("CSV,data,String")

Writing

To a File

CSV.open("path/to/file.csv", "wb") do |csv|
  csv << ["row", "of", "CSV", "data"]
  csv << ["another", "row"]
  # ...
end

To a String

csv_string = CSV.generate do |csv|
  csv << ["row", "of", "CSV", "data"]
  csv << ["another", "row"]
  # ...
end

Convert a Single Line

csv_string = ["CSV", "data"].to_csv   # to CSV
csv_array  = "CSV,String".parse_csv   # from CSV

Shortcut Interface

CSV             { |csv_out| csv_out << %w{my data here} }  # to $stdout
CSV(csv = "")   { |csv_str| csv_str << %w{my data here} }  # to a String
CSV($stderr)    { |csv_err| csv_err << %w{my data here} }  # to $stderr
CSV($stdin)     { |csv_in|  csv_in.each { |row| p row } }  # from $stdin

Advanced Usage

Wrap an IO Object

csv = CSV.new(io, options)
# ... read (with gets() or each()) from and write (with <<) to csv here ...

CSV and Character Encodings (M17n or Multilingualization)

This new CSV parser is m17n savvy. The parser works in the Encoding of the IO or String object being read from or written to. Your data is never transcoded (unless you ask Ruby to transcode it for you) and will literally be parsed in the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the Encoding of your data. This is accomplished by transcoding the parser itself into your Encoding.

Some transcoding must take place, of course, to accomplish this multiencoding support. For example, :col_sep, :row_sep, and :quote_char must be transcoded to match your data. Hopefully this makes the entire process feel transparent, since CSV’s defaults should just magically work for you data. However, you can set these values manually in the target Encoding to avoid the translation.

It’s also important to note that while all of CSV’s core parser is now Encoding agnostic, some features are not. For example, the built-in converters will try to transcode data to UTF-8 before making conversions. Again, you can provide custom converters that are aware of your Encodings to avoid this translation. It’s just too hard for me to support native conversions in all of Ruby’s Encodings.

Anyway, the practical side of this is simple: make sure IO and String objects passed into CSV have the proper Encoding set and everything should just work. CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(), CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.

One minor exception comes when generating CSV into a String with an Encoding that is not ASCII compatible. There’s no existing data for CSV to use to prepare itself and thus you will probably need to manually specify the desired Encoding for most of those cases. It will try to guess using the fields in a row of output though, when using CSV::generate_line() or Array#to_csv().

I try to point out any other Encoding issues in the documentation of methods as they come up.

This has been tested to the best of my ability with all non-“dummy” Encodings Ruby ships with. However, it is brave new code and may have some bugs. Please feel free to report any issues you find with it.

Constants

VERSION = "2.4.8".freeze

FieldInfo = Struct.new(:index, :line, :header)

DateMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} | \d{4}-\d{2}-\d{2} )\z /x

DateTimeMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} | \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} )\z /x

ConverterEncoding = Encoding.find("UTF-8")

Converters = { integer: lambda { |f| Integer(f.encode(ConverterEncoding)) rescue f }, float: lambda { |f| Float(f.encode(ConverterEncoding)) rescue f }, numeric: [:integer, :float], date: lambda { |f| begin e = f.encode(ConverterEncoding) e =~ DateMatcher ? Date.parse(e) : f rescue # encoding conversion or date parse errors f end }, date_time: lambda { |f| begin e = f.encode(ConverterEncoding) e =~ DateTimeMatcher ? DateTime.parse(e) : f rescue # encoding conversion or date parse errors f end }, all: [:date_time, :numeric] }

HeaderConverters = { downcase: lambda { |h| h.encode(ConverterEncoding).downcase }, symbol: lambda { |h| h.encode(ConverterEncoding).downcase.gsub(/\s+/, "_"). gsub(/\W+/, "").to_sym } }

DEFAULT_OPTIONS = { col_sep: ",", row_sep: :auto, quote_char: '"', field_size_limit: nil, converters: nil, unconverted_fields: nil, headers: false, return_headers: false, header_converters: nil, skip_blanks: false, force_quotes: false }.freeze

Attributes

[R] col_sep

The encoded :col_sep used in parsing and writing. See CSV::new for details.

[R] row_sep

The encoded :row_sep used in parsing and writing. See CSV::new for details.

[R] quote_char

The encoded :quote_char used in parsing and writing. See CSV::new for details.

[R] field_size_limit

The limit for field size, if any. See CSV::new for details.

[R] encoding

The Encoding CSV is parsing or writing in. This will be the Encoding you receive parsed data in and/or the Encoding data will be written in.

[R] lineno

The line number of the last row read from this file. Fields with nested line-end characters will not affect this count.

Show files where this class is defined (1 file)
Register or log in to add new notes.