# Question about co-existence matrix formation

Asked 6 days ago, Updated 6 days ago, 1 views

Hello, everyone I'm asking you a question during text analysis. After finishing nlp, we want to form a co-existence matrix based on interword co-existence. I used the code below. I used to use it well before, but it didn't work all of a sudden Post the question. Thank you for reviewing it.

``````import collections
import pandas as pd
import numpy as np

def co_occurrence(sentences, window_size):
d = collections.defaultdict(int)
vocab = set()
for text in sentences:
# # preprocessing (use tokenizer instead)
text = text.lower().split()
# # iterate over sentences
for i in range(len(text)):
token = text[i]
next_token = text[i+1 : i+1+window_size]
for t in next_token:
key = tuple( sorted([t, token]) )
d[key] += 1

# # formulate the dictionary into dataframe
vocab = sorted(vocab) # sort vocab
df = pd.DataFrame(data=np.zeros((len(vocab), len(vocab)), dtype=np.int16),
index=vocab,
columns=vocab)
for key, value in d.items():
df.at[key, key] = value
df.at[key, key] = value
return df

df = pd.read_csv('data.csv', encoding = 'utf-8')

# http://naver.me/x1eYJPQ2 << I put the file here

df['nlp'] = df["nlp"].str.replace("'", "")
df['nlp'] = df["nlp"].str.replace(",", "")
df['nlp'] = df["nlp"].str.replace("･", "")
df['nlp'] = df["nlp"].str.replace("・", "")
df['nlp'] = df["nlp"].str.replace("[", "")
df['nlp'] = df["nlp"].str.replace("]", "")
corpus = df.corpus.tolist()

df = co_occurrence(corpus, 3)

df.to_csv('co_occurrence.csv', encoding = 'utf-8')
``````

2022-09-20 15:49

I somehow.

Change encoding

``````# df = pd.read_csv('data2.csv', encoding = 'utf-8')
df = pd.read_csv('data2.csv', encoding = 'euc-kr')
``````

Replace with one with a column name that does not exist

``````# corpus = df.corpus.tolist()
corpus = df.nlp.tolist()
``````

2022-09-20 15:49

Popular Tags
python x 1174
android x 247
c x 235
java x 192
javascript x 106
mysql x 30
html x 27
node.js x 24
php x 21
list x 19