Tensorflow CSV File Read 2

티스토리 뷰

Tensorflow Step By Step

Tensorflow CSV File Read 2

꿈이있는 2017. 2. 22. 08:04

지난 포스팅에 이어서 CSV 파일을 가져오는 예제를 계속 했습니다.

이번에 해 본 것은 데이터를 한번에 다 가져오는 것이 아니라

전체 데이터를 Shuffle 하여 그중에 일부를 가져오는 것 입니다.

데이터가 많을 때 한번에 모든 데이터를 입력하려면

시간이 많이 소모 되기 때문에 랜덤하게 일부 데이터를 가져와서

학습시키는 것을 반복할 때 유용합니다.

이번에 사용한 데이터는

아래와 같이 생긴 모양의 데이터가 100개 Row 있습니다.

https://www.tensorflow.org/programmers_guide/reading_data

공식 사이트에 나와있는 대로 하다보면

에러가 계속 발생해서 찾다보니

tf.local_variables_initializer()를 추가 해야 한다는 것을 찾고 시행하니 잘 동작합니다.

from __future__ import print_function

import numpy as np

import tensorflow as tf

import math as math

import argparse

def read_my_file_format(filename_queue):

reader = tf.TextLineReader()

_, value = reader.read(filename_queue)

record_defaults = [[1], [1], [1], [1], [1]]

col1, col2, col3, col4, col5 = tf.decode_csv(value, record_defaults=record_defaults)

features = tf.pack([col1, col2, col3, col4])

label = tf.pack([col5])

return features, label

def input_pipeline(batch_size, num_epochs=None):

min_after_dequeue = 10000

capacity = min_after_dequeue + 3 * batch_size

filename_queue = tf.train.string_input_producer(["sampledata10002.csv"], num_epochs=num_epochs, shuffle=True)

example, label = read_my_file_format(filename_queue)

example_batch, label_batch = tf.train.shuffle_batch([example, label],

batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue)

return example_batch, label_batch

examples, labels = input_pipeline(3,1)

init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())

sess = tf.Session()

# Initialize the variables (like the epoch counter).

sess.run(init_op)

# Start input enqueue threads.

coord = tf.train.Coordinator()

threads = tf.train.start_queue_runners(sess=sess, coord=coord)

try:

while not coord.should_stop():

example_batch, label_batch = sess.run([examples, labels])

print(example_batch)

except tf.errors.OutOfRangeError:

print('Done training -- epoch limit reached')

finally:

# When done, ask the threads to stop.

coord.request_stop()

# Wait for threads to finish.

coord.join(threads)

sess.close()

실행 결과는 이런 식으로 출력 됩니다.

한번에 3개의 데이터가 출력되는 것을 볼 수 있습니다.

코드에서 사용되는 파라메터 중에 중요한 것은

filename_queue = tf.train.string_input_producer(["sampledata10002.csv"], num_epochs=num_epochs, shuffle=True)

에서 num_epochs 와

example_batch, label_batch = tf.train.shuffle_batch([example, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue)

에서 batch_size 입니다.

num_epochs에서 출력 데이터를 몇 배로 사용할 것인 이고

batch_size는 한번에 몇개의 데이터를 가져올 것인지 입니다.

예를들어 100개의 Row를 가지고 있는 데이터에서

num_epochs=1, batch_size=3 으로 실행하면

3개열을 33번 가져오고 (가져온 총 데이터 숫자 = 99) print('Done training -- epoch limit reached') 이 실행되고

num_epochs=2, batch_size=3 으로 실행하면

3개열을 67번 가져오고 (가져온 총 데이터 숫자 = 201) print('Done training -- epoch limit reached') 이 실행됩니다.

min_after_dequeue, capacity에 대한 설명은 공식 사이트에 잘 나와 있어서 그대로 가져왔습니다.

  # min_after_dequeue defines how big a buffer we will randomly sample
  #   from -- bigger means better shuffling but slower start up and more
  #   memory used.
  # capacity must be larger than min_after_dequeue and the amount larger
  #   determines the maximum we will prefetch.  Recommendation:
  #   min_after_dequeue + (num_threads + a small safety margin) * batch_size

저작자표시 비영리 변경금지 (새창열림)

'Tensorflow Step By Step' 카테고리의 다른 글

Tensorflow multi layer Regression (0)	2017.02.26
Tensorflow 1 layer Regression (0)	2017.02.25
Tensorflow CSV File Read 1 (0)	2017.02.16
Tensorboard 사용하기 2 (0)	2017.02.12
Tensorboard 사용하기 1 (5)	2017.02.12

공유하기 링크

페이스북
카카오스토리
트위터

Total

Today

Yesterday

최근에 올라온 글

최근에 달린 댓글

TAG more

Tensorflow step by step

티스토리 뷰

Tensorflow CSV File Read 2

'Tensorflow Step By Step' 카테고리의 다른 글

티스토리툴바