How to find a sequence from a txt file?

6 vues (au cours des 30 derniers jours)
Gabriela
Gabriela le 15 Mar 2024
Commenté : Voss le 15 Mar 2024
I'm trying to identify a specific primer sequence from three different text files. I already have this code:
clear;
clc;
% Open the file
fileID = fopen('cDNA1-1.txt', 'r');
% Read the DNA sequence from the file
dna_sequence = strfind(fileID, '%s');
%
DNAsequence=string(dna_sequence)
% Define the primer sequence
primer_sequence = 'TACG';
% Find the location of the primer sequence in the DNA sequence
primer_location = strfind(DNAsequence, primer_sequence);
% Display the location of the primer sequence
if isempty(primer_location)
disp('Primer sequence not found in the DNA sequence.');
else
disp(['Primer sequence found at position(s): ', num2str(primer_location)]);
end
However, for some reason my dna_sequence variable is empty, and I keep getting that the sequence is not found in any of the text files. I know that that's wrong, so I need help.
I will include the three txt files along with my code.
Thank you!

Réponse acceptée

Voss
Voss le 15 Mar 2024
Modifié(e) : Voss le 15 Mar 2024
The use of strfind here is not correct:
% Read the DNA sequence from the file
dna_sequence = strfind(fileID, '%s');
It looks like you meant to use fscanf:
% Read the DNA sequence from the file
dna_sequence = fscanf(fileID, '%s');
With that change, the code appears to work (but don't forget to fclose!!!!). (Note that the sequence 'TACG' is found in files 2-1 and 3-1 but not 1-1.)
% Open the file
fileID = fopen('cDNA2-1.txt', 'r');
% Read the DNA sequence from the file
dna_sequence = fscanf(fileID, '%s');
% Close the file
fclose(fileID);
%
DNAsequence=string(dna_sequence)
DNAsequence = "GACGCGGCGCAGGCGGCGGGAGTGCGAGCTGGGCCCGTGTTTCGGCCGCCGCCATGGCCGCGGTGGACCTGGAGAAGCTGCGGGCGTCGGGCGCGGGCAAGGCCATCGGCGTCCTGACCAGCGGCGGCGACGCGCAAGGTCCCCTGACAAGCCCACCAGGCCCCCTGCTGAGATGGCTGTGACCCTGGGCTGACCCGCCCAGTGGCACATTGACTCCGCCTGGAGCTGGGGAGACCAGAGAGGCCCTGTGGTTGGACGGTGGCCTGGGTGCGCTGCTCCTGCCCTCTCCTTGCCCTGCCTCAGCTGCTGCCTGCCAGAGGCGTGGCACCTCACCTCACACCTGCTCCCTGCTGCTGAGCCCCACGCCAAGCTGGAGAGCGGATGAGAAGCATGTGTAACCAGGGTAGAGGTCGAGAGTCCTCTCGTGGGGGTCTCCATGTTCAAGGGAGCTGCCGAGGCTTGAGCAGGAGCCCCCAGCAGGAAACTGGCTTTGCCAAGGCCCCCGCTGGGACAGACTGTTTCTTTCACTGCAGTCCTGGGAGCCGAGGGCAAGGGGACAGGAAAGAGGAAGTGACCTCAGAGCCTGGTGGCACCAGCATCATGTCCAGGCTGGGGGGCATGAACGCTGCTGTCCGGGCTGTGACGCGCATGGGCATTTATGTGGGTGCCAAAGTCTTCCTCATCTACGAGGGCTATGAGGGCCTCGTGGAGGGAGGTGAGAACATCAAGCAGGCCAACTGGCTGAGCGTCTCCAACATCATCCAGCTGGGCGGCACTATCATTGGCAGCGCTCGCTGCAAGGCCTTTACCACCAGGGAGGGGCGCCGGGCAGCGGCCTACAACCTGGTCCAGCACGGCATCACCAACCTGTGCGTCATCGGCGGGGATGGCAGCCTCACAGGTGCCAACATCTTCCGCAGCGAGTGGGGCAGCCTGCTGGAGGAGCTGGTGGCGGAAGGTAAGATCTCAGAGACTACAGCCCGGACCTACTCGCACCTGAACATCGCGGGCCTAGTGGGCTCCATCGATAACGACTTCTGCGGCACCGACATGACCATCGGCACGGACTCGGCCCTCCACCGCATCATGGAGGTCATCGATGCCATCACCACCACTGCCCAGAGCCACCAGAGGACCTTCGTGCTGGAAGTGATGGGCCGGCACTGCGGGTACCTGGCGCTGGTATCTGCACTGGCCTCAGGGGCCGACTGGCTGTTCATCCCCGAGGCTCCACCCGAGGACGGCTGGGAGAACTTCATGTGTGAGAGGCTGGGTGAGACTCGGAGCCGTGGGTCCCGACTGAACATCATCATCATCGCTGAGGGTGCCATTGACCGCAACGGGAAGCCCATCTCGTCCAGCTACGTGAAGGACCTGGTGGTTCAGAGGCTGGGCTTCGACACCCGTGTAACTGTGCTGGGCCACGTGCAGCGGGGAGGGACGCCCTCTGCCTTCGACCGGATCCTGAGCAGCAAGATGGGCATGGAGGCGGTGATGGCGCTGCTGGAAGCCACGCCTGACACGCCGGCCTGCGTGGTCACCCTCTCGGGGAACCAGTCAGTGCGGCTGCCCCTCATGGAGTGCGTGCAGATGACCAAGGAAGTGCAGAAAGCCATGGATGACAAGAGGTTTGACGAGGCCACCCAGCTCCGTGGTGGGAGCTTCGAGAACAACTGGAACATTTACAAGCTCCTCGCCCACCAGAAGCCCCCCAAGGAGAAGTCTAACTTCTCCCTGGCCATCCTGAATGTGGGGGCCCCGGCGGCTGGCATGAATGCGGCCGTGCGCTCGGCGGTGCGGACCGGCATCTCCCATGGACACACAGTATACGTGGTGCACGATGGCTTCGAAGGCCTAGCCAAGGGTCAGGTGCAAGAAGTAGGCTGGCACGACGTGGCCGGCTGGTTGGGGCGTGGTGGCTCCATGCTGGGGACCAAGAGGACCCTGCCCAAGGGCCAGCTGGAGTCCATTGTGGAGAACATCCGCATCTATGGTATTCACGCCCTGCTGGTGGTCGGTGGGTTTGAGGCCTATGAAGGGGTGCTGCAGCTGGTGGAGGCTCGCGGGCGCTACGAGGAGCTCTGCATCGTCATGTGTGTCATCCCAGCCACCATCAGCAACAACGTCCCTGGCACCGACTTCAGCCTGGGCTCCGACACTGCTGTAAATGCCGCCATGGAGAGCTGTGACCGCATCAAACAGTCTGCCTCGGGGACCAAGCGCCGTGTGTTCATCGTGGAGACCATGGGGGGTTACTGTGGCTACCTGGCCACCGTGACTGGCATTGCTGTGGGGGCCGACGCCGCCTACGTCTTCGAGGACCCTTTCAACATCCACGACTTAAAGGTCAACGTGGAGCACATGACGGAGAAGATGAAGACAGACATTCAGAGGGGCCTGGTGCTGCGGAACGAGAAGTGCCATGACTACTACACCACGGAGTTCCTGTACAACCTGTACTCATCAGAGGGCAAGGGCGTCTTCGACTGCAGGACCAATGTCCTGGGCCACCTGCAGCAGGGTGGCGCTCCAACCCCCTTTGACCGGAACTATGGGACCAAGCTGGGGGTGAAGGCCATGCTGTGGTTGTCGGAGAAGCTGCGCGAGGTTTACCGCAAGGGACGGGTGTTCGCCAATGCCCCAGACTCGGCCTGCGTGATCGGCCTGAAGAAGAAGGCGGTGGCCTTCAGCCCCGTCACTGAGCTCAAGAAAGACACTGATTTCGAGCACCGCATGCCACGGGAGCAGTGGTGGCTGAGCCTGCGGCTCATGCTGAAGATGCTGGCACAATACCGCATCAGTATGGCCGCCTACGTGTCAGGGGAGCTGGAGCACGTGACCCGCCGCACCCTGAGCATGGACAAGGGCTTCTGAGGCCAGCCATGCCCACGCCCCTCCCCAGCCCCCACCCATGCCAGCGCAGCGCCAGGGCTCAGATGGGGCCTGGGCTGTTGTGTCTGGAGCCTGCAGGCAGGTGGGGGCTGCGTCCCTGCTCAGCCCATCCCCTGCCTCTATCCCTGGCCACCTGCCAGGCCTCCCTCGGGCTGGTGTCTTGAGACCAGCCTGCCAGGCCCTCCAGCAGGAGGACAGAGTGCCCTGGGGCATCCACCTTCCTGCCCAGGGGACGTGGCGCTGTCGGTGTTTGGAGGCTGCTGCCCCCTGGCTTTGGCGCCCCATGGGCCCTCAGCGTCTCCCCATGCTGGGCTCACTACATGGGCCAGCCCTTGCTCTACCTGGCCGGTAGGCTGCTGGCGCCTAGGTTGTGTTGAGAGGGGGATGCCCCTGGCCCTGCCTCACTGTGACCTGCTCCTGCCCACGTGCAGCACCTGTCACCTTTTCTAGAAATAAAATCACCCTGACTGTGGGGTGCATCGGTCTCCGGAGA"
% Define the primer sequence
primer_sequence = 'TACG';
% Find the location of the primer sequence in the DNA sequence
primer_location = strfind(DNAsequence, primer_sequence)
primer_location = 1×6
685 1363 1828 2071 2308 2812
% Display the location of the primer sequence
if isempty(primer_location)
disp('Primer sequence not found in the DNA sequence.');
else
disp(['Primer sequence found at position(s): ', num2str(primer_location)]);
end
Primer sequence found at position(s): 685 1363 1828 2071 2308 2812
  2 commentaires
Gabriela
Gabriela le 15 Mar 2024
Thank you!
Voss
Voss le 15 Mar 2024
You're welcome!

Connectez-vous pour commenter.

Plus de réponses (1)

John D'Errico
John D'Errico le 15 Mar 2024
Modifié(e) : John D'Errico le 15 Mar 2024
strfind does NOT read a string from a file! You did this:
% Open the file
fileID = fopen('cDNA1-1.txt', 'r');
% Read the DNA sequence from the file
dna_sequence = strfind(fileID, '%s');
WRONG. You opened the file, but then never read anything from the file. Essentially, you got ahead of yourself.
fileID = fopen('cDNA1-1.txt', 'r');
I'll use fread, which brings them in as ascii. So char will convert them. As well, I'll make it a row vector. (There are many ways we could do this. I'm just grabbing one that works.)
D = char(fread(fileID))';
But note that the file contains carriage returns and line feed characters, so I'll strip them out. Keep only the DNA part.
D = D(ismember(D,'ACGT'))
D = 'TCACTGACCCCACTCCTGAGCATGAACTCTCCTCCCCTCCACTCTGCTGTCAGGTTTTGTCTCCATTGGCCAAGAACCTCTTCCACCGGGCCATTTCTGAGAGTGGCGTGGCCCTCACTTCTGTTCTGGTGAAGAAAGGTGATGTCAAGCCCTTGGCTGAGCAAATTGCTATCACTGCTGGGTGCAAAACCACCACCTCTGCTGTCATGGTTCACTGCCTGCGACAGAAGACGGAAGAGGAGCTCTTGGAGACGACATTGAAAATGAAATTCTTATCTCTGGACTTACAGGGAGACCCCAGAGAGAGTCAACCCCTTCTGGGCACTGTGATTGATGGGATGCTGCTGCTGAAAACACCTGAAGAGCTTCAAGCTGAAAGGAATTTCCACACTGTCCCCTACATGGTCGGAATTAACAAGCAGGAGTTTGGCTGGTTGATTCCAATGCAGTTGATGAGCTATCCACTCTCCGAAGGGCAACTGGACCAGAAGACAGCCATGTCACTCCTGTGGAAGTCCTATCCCCTTGTTTGCATTGCTAAGGAACTGATTCCAGAAGCCACTGAGAAATACTTAGGAGGAACAGACGACACTGTCAAAAAGAAAGACCTGTTCCTGGACTTGATAGCAGATGTGATGTTTGGTGTCCCATCTGTGATTGTGGCCCGGAACCACAGAGATGCTGGAGCACCCACCTACATGTATGAGTTTCAGTACCGTCCAAGCTTCTCATCAGACATGAAACCCAAGACGGTGATAGGAGACCACGGGGATGAGCTCTTCTCCGTCTTTGGGGCCCCATTTTTAAAAGAGGGTGCCTCAGAAGAGGAGATCAGACTTAGCAAGATGGTGATGAAATTCTGGGCCAACTTTGCTCGCAATGGAAACCCCAATGGGGAAGGGCTGCCCCACTGGCCAGAGTACAACCAGAAGGAAGGGTATCTGCAGATTGGTGCCAACACCCAGGCGGCCCAGAAGCTGAAGGACAAAGAAGTAGCTTTCTGGACCAACCTCTTTGCCAAGAAGGCAGTGGAGAAGCCACCCCAGACAGAACACATAGAGCTGTGAATGAAGATCCAGCCGGCCTTGGGAGCCTGGAGGAGCAAAGACTGGGGTCTTTTGCGAAAGGGATTGCAGGTTCAGAAGGCATCTTACCATGGCTGGGGAATTGTCTGGTGGTGGGGGGCAGGGGACAGAGGCCATGAAGGAGCAAGTTTTGTATTTGTGACCTCAGCTTTGGGAATAAAGGATCTTTTGAAGGCCAAA'
strfind(D,'TCAG')
ans = 1×7
50 710 732 819 832 1139 1230
And that would be the locations of that substring in your file.

Catégories

En savoir plus sur Low-Level File I/O dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by