Deep Learning on Graphs and Networks
Deep learning on graphs and networks is a rapidly growing field with many applications in social network analysis, recommendation systems, and drug discovery. In this type of analysis, the data is represented as a graph or network, where nodes represent entities and edges represent relationships between them. Here, we will demonstrate how to use Graph Convolutional Networks (GCNs) for node classification on the Cora dataset.
First, we need to install the necessary packages:
!pip install tensorflow==2.5.0
!pip install tensorflow-datasets==4.3.0
!pip install networkx==2.6.3
Next, we can load the Cora dataset and convert it into a networkx graph:
import tensorflow_datasets as tfds
import networkx as nx
# Load Cora dataset
cora, _ = tfds.load('cora', split='train', with_info=True)
# Convert to networkx graph
G = nx.Graph()
for i in range(len(cora['adjacency_list'])):
G.add_node(i, label=cora['label'][i].numpy())
for j in cora['adjacency_list'][i]:
G.add_edge(i, j)
We can then visualize the graph using the networkx library:
import matplotlib.pyplot as plt
# Draw graph with node labels
labels = {i: cora['label'][i].numpy() for i in range(len(cora['label']))}
nx.draw(G, node_color=[labels[node] for node in G.nodes()], with_labels=True)
plt.show()
Next, we need to preprocess the graph data for use in a GCN. This involves computing the adjacency matrix and node feature matrix:
import numpy as np
from scipy.sparse import coo_matrix
# Compute adjacency matrix
adj_mat = nx.to_numpy_array(G)
adj_mat = adj_mat + np.eye(adj_mat.shape[0]) # Add self-loops
d = np.diag(np.sum(adj_mat, axis=1)) # Compute degree matrix
d_inv_sqrt = np.diag(1 / np.sqrt(np.sum(adj_mat, axis=1)))
adj_norm = d_inv_sqrt @ adj_mat @ d_inv_sqrt # Compute normalized adjacency matrix
# Compute node feature matrix
features = np.zeros((len(cora['label']), 1433))
for i in range(len(cora['feature'])):
features[i] = cora['feature'][i].numpy()
We can then build and train the GCN model:
from tensorflow.keras.layers import Input, Dropout, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import CategoricalAccuracy
from tensorflow.keras.utils import to_categorical
# Build GCN model
input_layer = Input(shape=(features.shape[1],))
hidden_layer = Dropout(0.5)(Dense(64, activation='relu')(input_layer))
output_layer = Dense(7, activation='softmax')(hidden_layer)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer=Adam(lr=0.01), loss=CategoricalCrossentropy(), metrics=[CategoricalAccuracy()])
# Train GCN model
y = to_categorical(cora['label'].numpy())
history = model.fit(features, y, batch_size=16, epochs=100, validation_split=0.1)
Finally, we can use the trained model to make predictions on new nodes:
# Make predictions on new nodes
cora_test, _ = tfds.load('cora', split='test', with_info=True)
features_test = np.zeros((len(cora_test['label']), 1433))
for i in range(len(cora_test['feature'])):
features_test[i] = cora_test['feature'][i].numpy()
y_test_pred = model.predict(features_test)
y_test_pred = np.argmax(y_test_pred, axis=1)
# Compute test accuracy
y_test_true = cora_test['label'].numpy()
test_accuracy = np.sum(y_test_pred == y_test_true) / len(y_test_true)
print('Test accuracy:', test_accuracy)
This code loads the test split of the Cora dataset and preprocesses the node features in the same way as before. It then uses the trained GCN model to make predictions on the new nodes and computes the test accuracy.
Leave a Comment