I needed car data for a Ruby on Rails project. I found a GitHub repository where someone generously shared 14,000 rows of car model, makes, and year.
The challenge was to import it into a Rails database running Postgresql.
I used Ruby to format each row to an array and then exported that into a CSV. Here’s how I did it.
Cleaning Data with Ruby
I created a new file called
car_types.txt in the Rails project/lib folder.
I selected a few records from the github repo and pasted them into the
car_types.txt file. We’ll process a few records now and wait until the end to process the rest.
I often use a debugger when I’m cleaning up data. Since I’m working in a Ruby on Rails project, I’ll be using byebug debugger.
In Rails, we use
seeds.rb file to populate the database with test data. It makes sense to do data clean up there as well.
Let’s open up the
car_types.txt file with the
File class in Ruby. We’re going to read data line by line and transform it into CSV format.
File.open('lib/car_types.txt').each do |line| end
I’m going set a byebug breakpoint so that the application halts within the
File.open block and we can see what is happening.
File.open('lib/car_types.txt').each do |line| byebug end
# Run the seeds.rb from the console. rails db:seed
# In byebug mode. line # "(1926, 'Chrysler', 'Imperial'),\n" # We want to get each line looking like this: # "1926,Chrysler,Imperial"
#create an empty array to store each line of record. cars =  File.open('lib/car_types.txt').each do |line| line # Remove the left parenthesis line.gsub!("(", "") # Remove the right parenthesis line.gsub!(")", "") # Remove the single quotes line.gsub!("'", "") # Remove the new line (\n) line.chomp! # Remove the comma at the end. line = line[0...-1] # Remove the any spaces. line.gsub!(", ", ",") #line is now properly formatted. line # "1926, Chrysler, Imperial" #Add the line into the cars array. cars.push(line) end
We successfully formatted each line and wrote all the results to the
cars array. Let’s write that array to a csv file.
Let’s create a file
car_list_new.csv in the project/lib folder.
I won’t mention byebug with every example I show here but I highly encourage you to explore your code by using it to set breakpoints and see what Ruby is doing.
File.open block which will open the new
car_list_new.csv file and write the formatted car data to it.
File.open('lib/car_list_new.csv', 'a') do |file| file.write(line) file.write("\n") end
car_list_new.csv and you’ll see csv formatted car data on each line.
1926,Chrysler,Imperial 1948,Citroën,2CV 1950,Hillman,Minx Magnificent 1953,Chevrolet,Corvette 1954,Chevrolet,Corvette 1954,Cadillac,Fleetwood 1955,Chevrolet,Corvette 1955,Ford,Thunderbird 1956,Chevrolet,Corvette 1957,Chevrolet,Corvette 1957,BMW,600 1958,Chevrolet,Corvette 1958,BMW,600 1958,Ford,Thunderbird
Writing CSV to Postgresql
Let’s write the results to the database. In
seeds.db, we’ll open
car_list_new.csv and read each line within the block.
File.open('lib/car_list_new.csv').each do |line| line # "Year,Make,Model\n" #Let's remove the new line (\n) and split each item into an array. vehicle = line.chomp.split(",") #Create a record in the database from each line. Car.create(year: vehicle.to_i, make: vehicle, model: vehicle) end
I view the database with Postico and see the populated car records.
Inserting all car records
Let’s insert all 14,000 cars in the database. Copy and paste all the car records into
car_types.txt and delete all text from
car_list_new.csv file. Before we run all our code in
seeds.db, let’s clear out all the records from the
car table by running
Here’s how the final
seeds.db file looks like.
Car.delete_all cars =  File.open('lib/car_types.txt').each do |line| line line.gsub!("(", "") line.gsub!(")", "") line.gsub!("'", "") line.chomp! line = line[0...-1] line.gsub!(", ", ",") cars << line end File.open('lib/car_list_new.csv').each do |line| vehicle = line.chomp.split(",") Car.create(year: vehicle.to_i, make: vehicle, model: vehicle) end
It took about 15 seconds to populate the database with over 14,000 rows of car data.
The way I’m writing to the database is inefficient. It’s slow. We’re pinging the database to create a row after reading each line of records. That’s over 14,000 hits to the database. You can use the activerecord-import gem to bring the time down to 2 seconds.
However, it doesn’t matter. I’m using this data in development environment and waiting 15 seconds isn’t a deal breaker. As a side benefit, I don’t have to use extra gems.