# GoodreadsProject **Repository Path**: ljyslyc/Goodreads ## Basic Information - **Project Name**: GoodreadsProject - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-03-01 - **Last Updated**: 2021-11-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Comparison of Three Recommender Systems on Goodreads Dataset ------------------------------------------------------------- Jay Shi, William Su Here's a guide to help you navigate through our files: 1. To understand all of our data, models, and implementation go to: 1. FinalReport.pdf 2. For Data and Model Visualization, go to EDA&ModelViz.ipynb There we perform a number of tasks including: 1. Visualizing the distrubtion of the average ratings 2. Comparing the three matrices (shelved, isRead, and rating) 2. Visualizing the three models (user-user, matrix factorization, neural network matrix factorization) through different plots 3. Comparing all the models' MSE performance together 3. Data Cleaning, go to: 1. Folder: ExtractingMatrices - json_to_csv_converter.py is a script we found from: https://github.com/Yelp/dataset-examples/blob/master/json_to_csv_converter.py it converts the raw data goodreads_interactions_fantasy_paranormal.json.gz to goodreads_interactions_fantasy_paranormal.csv file - extractThreeMatrices.ipynb notebook will show you how we converted a goodreads_interactions_fantasy_paranormal.csv file data into three sparse matrices: 1. rating_matrix_fantasy.npz 2. isRead_matrix_fantasy.npz 3. shelved_matrix_fantasy.npz - BOOK_ID_TO_INT_fantasy.json and USER_ID_TO_INT_fantasy.json are json files that we saved from the extractThreeMatrices.ipynb notebook - in the notebook, we mapped the original id's (strings) to a specific range of int's that we use. this json file helps us map these int's back to the original id's 2. Folder: ShrinkMatrices - the shrinkThreeMatricies.ipynb notebook will show you how we shrunk our original 3 sparse matrices 1. rating_matrix_shrunk.npz 2. isRead_matrix_shrunk.npz 3. shelved_matrix_shrunk.npz - we decided to shrink the matrices because they were too large and took too much computing power to run 4. UserUser model, go to 1. Folder: UserUserModel - sweepUserUSer.ipynb notebook will show you how we build our weighted user-user model and also how we swept through the parameters w1 and w2: 1. user_user_sweep1.json user_user_sweep7.json contains json of key (w1, w2) mapped to test_data MSE's 5. For MatrixFactorization model go to: 1. Folder: MatrixFactorizationModel - sweep_params_matrix_fact.ipynb notebook will show you how we built the matrix factorizaton model based on a UV decomposition and alternating least squares and show you how we swept through the parameters K and reg 1. matrix_factorization_all_params.json contains a json of key (K, reg) mapped to train_data MSE's and test_data MSE's 2. best_param_50_niter.npy contains a numpy array that contains test_data MSE from iteration 1 to iteration 50 using the best tuned K, reg pair we found from matrix_factorization_all_params.json 6. To understand how we built our neural network-based matrix factorization model go to 1. Folder: NeuralNetworkModel - nn_factorization.ipynb notebook will walk you through how the model is built - nn_history is the saved data of a model that we built - nn_model is the saved model that we built - we use this in our EDA&ModelViz notebook to visualize the model and data as it takes too long to retrain a model - nn_sweep.json is a json mapping latent spaces k that we swept to the test MSE