search_csv_data.rb

Heads up: This description was created by AI and might not be 100% accurate.

このRubyスクリプトは、CSVデータからレコードを検索し、重複をフラグメントする機能を提供します。主な機能は、CSVデータの各固有キーの場所を保存し、その場所を検索することです。

プログラムの特徴:

CSVファイルの読み込みと解析: CSVライブラリを使用してCSVファイルを読み込み、解析します。
キーベースの検索: CSVデータの各行を、指定されたキーの値で識別し、検索します。
重複検出: データの重複を検出するために、キーの存在を確認します。
Hashによる場所の保存: 各固有キーとその位置（インデックス）をHashに格納します。

アルゴリズム:

store_data_location(data:, key_index:) 関数:
- 入力:
  - data: 検索対象のCSVデータの配列（行が配列の要素）。
  - key_index: 検索キーのインデックス。
- アルゴリズム:
  - 空のHash data_location を初期化します。
  - 入力データ data を各要素（行）で反復処理します。
  - 各行の key_index 番目の要素（値）を取得し、文字列に変換します。
  - data_location に同じキーが存在するかどうかを確認します。
  - 存在する場合、index を0に設定します（重複）。
  - 存在しない場合、data_location にキーとインデックスのペアを格納します。
- 出力: data_location Hash（キーとインデックスの対応）。
search_data_location(csv_file_path: 'sample.csv', target_key: 'id') 関数:
- 入力:
  - csv_file_path: CSVファイルのパス（デフォルト: ‘sample.csv’）。
  - target_key: 検索キーの名前（デフォルト: ‘id’）。
- アルゴリズム:
  - CSVファイルを読み込み、CSV.read() でデータ配列を取得します。
  - CSVファイルのヘッダー（列名）を解析し、target_key のインデックスをheaders_hashに格納します。
  - store_data_location() 関数を呼び出して、csv_data_arr（CSVデータ）とkey_column_index_number (target_keyのインデックス) を引数として渡します。
- 出力: data_location Hash（キーとインデックスの対応）。
main 関数:
- search_data_location 関数を呼び出して、data_index Hash を取得します。
- サンプルデータを定義 (keys_for_search) します。
- keys_for_search 配列を反復処理し、各キーについて、data_index Hash を使用してCSVデータから対応する行（レコード）を検索します。
- 重複しているか、または存在するかをチェックし、結果をコンソールに出力します。

全体的なアルゴリズム:

CSVファイルからデータを読み込む。
キーに基づいてデータをインデックス化する。
キーに基づいてデータの場所を検索する。
検索結果を検証し、結果を出力する。

Ruby code snippet

# frozen_string_literal: true
#=> nil

# Look up records in CSV data and flag duplicates
#=> nil

require 'csv'
#=> true

# Store the location of each unique key in the data
#=> nil
#
#=> nil
# @param data [Array<Array<String>>] The data to search through
#=> nil
# @param key_index [Integer] The index of the key to use for lookup
#=> nil
# @return [Hash<String, Integer>] A hash mapping each unique key to its index in the data
#=> nil
def store_data_location(data:, key_index:)
  data_location = {}

  data.each.with_index do |element, index|
      key = element[key_index].to_s
  
      # index = 0: duplicated
      index = 0 if data_location.key?(key)
      data_location.store(key, index)
  end
  data_location
end
#=> :store_data_location

# Search for the location of each unique key in the data
#=> nil
#
#=> nil
# @param csv_file_path [String] The path to the CSV file to search
#=> nil
# @param target_key [String] The key to use for lookup
#=> nil
# @return [Hash<String, Integer>] A hash mapping each unique key to its index in the data
#=> nil
def search_data_location(csv_file_path: 'sample.csv',
                           target_key: 'id')
  csv_data_arr = CSV.read(csv_file_path)

  # key column index number
  headers = csv_data_arr.first
  headers_hash = headers.each.with_index.to_h #=> {'id' => 0, 'name' => 1, ...}
  key_column_index_number = headers_hash[target_key]

  store_data_location(data: csv_data_arr, key_index: key_column_index_number)
end
#=> :search_data_location

def main
  # search preparation
  ## find the index beforehand
  data_index = search_data_location(csv_file_path: 'sample.csv', target_key: 'email')
  ## load the CSV data into memory and assign it to a variable.
  csv_data_arr = CSV.read('sample.csv')

  # example: search
  puts '=== SEARCH ==='
  keys_for_search = ['john@example.com', 'ken@example.com', 'maria@example.com']
  keys_for_search.each do |key_for_search|
      location_index = data_index[key_for_search]
      if location_index.eql?(0)
          # TODO: modify as needed
          p "#{key_for_search} is duplicated."
        else
          # TODO: modify as needed
          p csv_data_arr[location_index]
      end
  end
end
#=> :main

main if __FILE__ == $PROGRAM_NAME
#=> nil

Executed with Ruby 3.4.9.